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ADAPTIVE SAMPLING METHOD FOR IMPROVED CONTROL IN 
SEMICONDUCTOR MANUFACTURING 

TECHNICAL FIELD 

5 This inventioii relates generally to semiconductor febrication technology, and, more particularly, to a 

method for semiconductor fabrication supervision and optimization. 

BACKGROUND ART 

There is a constant drive within the semiconductor industry to increase the quality, reliability and 
throughput of integrated circuit devices, e.g., microprocessors, memory devices, and the like. This drive is fueled 

10 by consumer demands for higher quality computers and electronic devices that operate more reliably. These 
demands have resulted in a continual improvement in the manufecture of semiconductor devices, e.g., transistors, 
as well as in the manufacture of integrated circuit devices incoiporating such transistors. Additionally, reduciiig 
defects in the manufacture of the components of a typical transistor also lowers the overall cost per transistor as 
well as tiie cost of integrated circuit devices mcoiporating such transistors. 

15 The technologies underlying semiconductor processing tools have attracted increased attention over the 

last' several years, resulting in substantial refinements. However, despite the advances made in this area, many of 
the processing tools that are currently commercially available suffer certain deficiencies. In particular, such tools 
often lack advanced process data monitoring capabilities, such as the ability to provide historical parametric data 
in a user-fnendly format, as well as event logging, real-time gnqshical display of both current processing 

20 parameters and tbe processing parameters of tine entire run, and remote, i.e., local site and worldwide, monitoring. 
These deficiencies can engender nonoptimal control of critical processing parametws, such as throughput 
accuracy, stability and repeatability, processing temperatures, mechanical tool parameters, and the like. This 
variability manifests itself as within-run disparities, run-to-run disparities and tool-to-tool disparities that can 
propagate into deviations in product quality and performance, whereas an improved monitoring and diagnostics 

25 system for such tools would provide a means of monitoring this variability, as well as providing means for 
optimizing control of critical parameters. 

Run-to-run control as practiced in high-volume, multi-product semiconductor manufacturing does not 
easily fit into the framework of traditional approaches to 'process control. A typical approach defines a process 
model with a given set of states, inputs, and outputs. In some cases, the model is static, and in others, the model 

30 changes over time. At each time step, inputs and disturbaiices affect the states, and outputs are measured. Then, 
the controller makes an update and the process repeats. One reason this approach is not always applicable is that 
there are often multiple processing tools as well as multiple products. In addition, of all the measurements 
important to a process, only a subset are generally made on each run. Determining how to do controller updates 
in this environment can be a challenging task. 

35 A run-to-run controller relies on having a process model that is consistently correct from run to run. 

When the various processes run on the tool are significantly different, the controller may behave unexpectedly 
because a change to a new process can appear to be a Jarge disturbance. In addition, it may take several 
successive runs of a given process for the controilef.to stabilize, but manufacturing constraints may prevent this 
from happening. It is desirable that the controller would determine optimal settings for all processes that must run 

40 on the tool, regardless of the order in which they appear. 

\ -1- 
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An example of a system that exhibits this behavior is the chemical mechanical planarization (CMP) of 
inter-layer dielectric (ILD) layers. Due to differences in pattern density and processing history, each layer/product 
combination processes at a different rate. In addition, as each product is qualified to run on several toolsets, there 
are also systematic variations caused by differences between the tools. Thus, one of the many control problems is 

5 to determine the optimal settings for each product/layer/tool combinadon that arises. Additionally, Uie 
measurements that provide the controller with infonnation (such as measurements of removal from product 
wafers and/or test wafer qualification events) are provided at asynchronous intervals based on operational rules 
without regard to the control problems. 

Other parameters it would be useful to monitor and control are process parameters related to rapid 

10 thennai processing (RTP). Examples of such process parameters include the temperatures and lamp powef levels 
that silicon wafers and/or workpieces are exposed to during the rapid thermal processing (RTP) used to activate 
dopant implants, for example. The rapid thermal processing (RTP) performance typically degrades with 
consecutive process runs, in part due to drift in the respective settings of the rapid thermal processing (RTP) tool 
and/or the rapid diermal processing (RTP) sensors. This may cause differences in wafer processing between 

15 successive runs or batches or lots of wafers, leading to decreased satisfectoiy wafer throughput, decreased 
reliability, decreased precision and decreased accuracy m tiie semiconductor manufacturing process. 

However, traditional statistical process control (SPC) techniques are often inadequate to control 
precisely process parameters related to rapid thermal processing (RTP) in semiconductor and microelectronic 
device manufacturing so as to optimize device perfomiance and yield. Typically, statistical process control (SPC) 

20 techniques set a target value, and a spread about the target value, for the process parameters related to rapid 
thermal processing (RTP). The statistical process control (SPC) techniques then attempt to minimize the 
deviation from the target value without automatically adjusting and adapting the respective target values to 
optimize &e semiconductor device performance, and/or to optimize the semiconductor device yield and 
throughput. Furthermore, blindly minimizing non-adaptive processing spreads about target values may not 

2S increase processing yield and throughput 

Traditional control techniques are frequently ineffective in reducuig off-target processing and in 
improving sort yields. For example, wafer electrical test (WET) measurements are typically not performed on 
processed wafers until quite a long time after the wafers have been processed, sometimes not until weeks later. 
When one or more of the processing steps are producing resulting wafers that the wafer electrical test (WET) 

30 measurements indicate are unacceptable, causing the resulting wafers to be scrapped, this misprocessing goes 
undetected and uncorrected for quite a while, often for weeks, leading to many scrapped wafers, much wasted 
material and decreased overall throughput. 

Metrology operations require a significant amount of capital and consume large amounts of cycle time 
m semiconductor manufacturing. Optimizing metrology may therefore significantly improve "fab" capital 

35 requirements and operating costs. However, traditional methods of optimization are often either based on ad hoc 
decisions and/or in some cases, careful statistical analysis to determine a "besf sampling rate for a given 
process/operation, balancing the improvements in control associated with increased sampling against the 
increased costs of such sampling. 

The present invention is directed to overcoming, or at least reducing the effects of, one or more of the 

40 problems set forth above. 
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DISCLOSURE OF INVENTION 
In one aspect of the present invention, a me&od is provided, &e mediod comprising sampling at least 
one parameter characteristic of processing performed on a worlqjiece in at least one processing step, and 
5 modeling the at least one characteristic parameter sampled using an adaptive sampling processing model, treating 
sampling as an integrated part of a dynamic control environment, varying the sampling based upon at least one of 
situational infomation, upstream events and requirements of run-to-nm controllers. The method also comprises 
applying the adaptive samplmg processmg model to modify the processing performed in the at least one 
processing step. 

10 In another aspect of the present invention, a computer-readable, program storage device is provided, 

encoded with instructions that, when executed by a computer, perform a method, the method comprising 
sampling at least one parameter characteristic of processing performed on a worlq)iece in at least one processing 
step, and modelmg the at least one characteristic parameter sampled using an adaqstive sampling processing 
model, treating sampling as an integrated part of a dynamic control environment, varying the samplmg based 

15 upon at least one of situational information, upstream events and requirements of nin-to-nm controllers. The 
method also comprises applying the adaptive sampling processing model to modify the processing perfonned in 
the at least one processing step. 

In yet another aspect of the present invention, a computer programmed to perform a method is provided, 
the method comprising sampling at least one parameter characteristic of processing performed on a workpiece in 

20 at least one processing step, and modeling the at least one characteristic parameter sampled using an adaptive 
sampling processing model, treating sampling as an integrated part of a dynamic control environment, varying 
the sampling based upon at least one of situational information, upstream events and requirements of run-to-run 
controllers. The method also comprises applymg the adaptive sampling processmg model to modify the 
processing performed in the at least one processing step. 

25 In another aspect of the present invention, a system is provided, the system comprising a tool for 

sampling at least one parameter characteristic of processing perfonned on a workpiece in at least one processing 
step, and a computer for modeling the at least one characteristic parameter sampled using an adaptive sampling 
processing model, treating sampling as an integrated part of a dynamic contro.l environment, varying the 
sampling based upon at least one of situational information, upstream events and requirements of mn-to-nm 

30 controllers. The system also comprises a controller for applying the adaptive sampling processing model to 
modify the processing performed in the at least one processing step. 

In yet another aspect of the present invention, a device is provided, die device comprising means for 
sampling at least one parameter characteristic of processing performed on a workpiece in at least one processing 
step, and means for modeling the at least one characteristic parameter sampled using an adaptive sampling 

35 processing model, treating sampling as an integrated part of a dynamic control environment, varying the 
sampling based upon at least one of situational information, upstream events and requirements of run-to-run 
controllers. The device also comprises means for applying the adaptive sampling processing model to modify the 
processing performed in the at least one processing step. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
The invention may be understood by reference to the following description taken in conjunction with the 
accompanying drawings, in which the leftmost significant digit(s) in the reference numerals denote(s) the first 
figure in which the respective reference numerals appear, and in which: 

Figures 1-30 schematically illustrate various embodiments of a method for manufacturing according to 
the present inv^tion; and, more particularly: 

Figures 1, and 3-10 sdiematically illustrate a flow chart for various illustrative embodiments of a 
mediod according to the present invention; 

Figure 2 schematically illustrates in cross-section an AST SHS 2800 r^id thermal processing (RTP) 
tool representative of those used in various illustrative embodiments of tiie present mvention; 

Figure 11 schematically illustrates a method for fabricating a semiconductor device practiced in 
accordance with the present invention; 

Figure 12 schematically illustrates workpieces being processed using a processmg tool, using a plurality 
of control input signals, in accordance with the present invention; 

Figures 13-14 schematically illustrate one particular embodiment of the process and tool in Figure 12; 

Figure 15 schematically illustrates one particular embodiment of tiie metiiod of Figure 1 1 that may be 
practiced with the process and tool of Figures 13-14; 

Figures 16 and 17 schematically illustrate first and second Principal Components for nespective rapid 
thermal processing data sets; 

Figures 18 and 19 schematically illustrate geometrically Principal Components Analysis for respective 
rapid thermal processing data sets; and 

Figures 20-23 schematically illustrate geometrically polynomial least-squares fitting, in accordaace with 
the present invention. Percent deviation fi-om target: hypothetical best case; 

Figure 24 schematically illustrates simulation of product switchmg; 

Figure 25 schematically illustrates percent deviation firom target: hypothetical best case; 

Figure 26 schematically iDustrates percent deviation fi-om target: "fixed ou^uts'* case; 

Figure 27 schematically illustrates percent deviation fi-om target: "predicted outputs" case; 

Figure 28 schematically illustrates percent deviation from target: "predicted outputs" case with extra 
qualifications; 

Figure 29 schematically illustrates percent deviation from target: large-scale system; and 

Figure 30 is a simplified block diagram of a manufecturing system in accordance witii various 
illustrative embodiments of tiie present invention. 

While the invention is susceptible to various modifications and alternative forms, specific embodiments 
thereof have been shown by way of example in the drawings and are herein described in detail. It should be 
understood, however, that the description herein of specific embodiments is not intended to limit the invention to 
the particular forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and 
alternatives falling within the spirit and scope of the invention as defmed by the appended claims. 

MODE(S) FOR CARRYING OUT THE INVENTION 

Illustrative embodiments of the invention are described below. In the interest of clarity, not all features 
of an actual implementation are described in this specification. It will of course be appreciated that m the 

-4- 
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development of any such actual embodiment, numerous implementation-specific decisions must be made to 
achieve the developers' specific goals, such as compliance with system-related and business-related constrmnts, 
which will vary ftom one implementation to another. Moreover, it will be appreciated tiiat such a development 
effort might be complex and time-consuming, but would nevertheless be a routine undertaking for those of 
5 ordinary skill in the art having the benefit of this disclosure. 

Illustrative embodiments of a metiiod according to tiie present invention are shown in Figures 1-30. As 
shown in Figure 1, a workpiece 100, such as a semiconducting substrate or wafer, having zero, one, or more 
process layers and/or semiconductor devices, such as a metal-oxide-semiconductor (MOS) transistor disposed 
thereon, is delivered to a processing tool 105. In die processing tool 105, rapid thennal processing, for example, 

10 such as a rapid thermal anneal, may be performed on the workpiece 100. 

Figure 2 schematically illustrates in cross-section a rapid thermal anneal (RTA) tool 200, e.g., an AST 
SHS 2800 rapid thermal anneal (RTA) tool, that may be used as the rapid thermal processing (RTP) tool 105 in 
various illustrative embodiments according to the present invention. Various alternative illustrative embodiments 
of die present invention may use rapid thennal anneal (RTA) tools (such as Ae Centura® RTP) manufactured by 

15 Applied Materials (AMAT), which are quite different in physical form, usage, and measured parameters, but 
which may, nevertheless, be used as the rapid thermal processing (RTP) tool 105. Still other various alternative 
illustrative embodiments of the present invention may use an etching tool and/or a planarizing tool and/or a 
deposition tool, and the like, as the processing tool 105. 

As shown in Figure 2, the illustrative rapid thennal anneal (RTA). tool 200 may heat a workpiece 100, 

20 such as a semiconducting silicon wafer with zero, one, or more process layers formed thereon, by using an array 
of halogen lamps 210 disposed above and below the workpiece 100. The workpiece 100 may be disposed on 
quartz pms and a wafer stand 2li5 within a quartz tube 220 heated by the array of halogen lamps 210. The wafer 
stand 215 may include otfier components, such as an AST Hot Liner™. The temperature of the quartz tube 220 
may be measured by a thermocouple and/or a pyrometer 230 that measures the temperature of the AST Hot 

25 Liner™ component of the wafer stand 215 and/or a separate pyrometer (not shown). The quartz tube 220 may 
have a quartz window 225 disposed therein below the>wafer stand 215. The temperature of the AST Hot Liner™ 
component of the wafer stand 215, and, indirectly, the workpiece 100 may be measured through the quartz 
window 225 by the pyrometer 230 disposed below fee fquartz window 225. Alternatively, the pyrometer 230 
disposed below the quartz wmdow 225 may directly measure the temperature of the workpiece 100. The lamp 

30 power of the halogen lamps 210 may also be monitored and controlled. 

As shown in Figure 3, the processing tool 105 may communicate with a monitoring step 1 10 and other 
processing steps 140 via bidirectional cormections through a system communications bus 160. As shown in 
Figure 3, the system communications bus 160 also provides communications between the processing tool 105, 
the monitoring step 110 and other processing steps'140, and an Advanced Process Control (APC) system 120, 

35 more fully described below. 

As shown in Figure 4, the workpiece 100 is sent from the processing tool 105 and delivered to the 
monitoring step 110. In the monitoring step 110,"6he or -more processing tool variables and/or one or more 
processing parameters during one or more prqce^ing runs may be monitored and/or measured. Such tool 
variables and/or processing parameters may comprise oiie or more pyrometer trace readings, one or more lamp « 

40 power trace readings, one or more tube temperature trace readings, one or more current readings, one or more 



wo 02/23289 PCTAJSOl/28003 

infrared (IR) signal readings, one or more • optical emission spectrum readings, one or more process gas 
temperature readings, one or more process gas pressure readings, one or more process gas flow rate readings, one 
or more etch depths, one or more process layer thicknesses, one or more resistivity readings, and the like. As 
shown in Figure 4, the monitoring step 110 may communicate with the processing tool 105 via tiie system 

5 commmiications bus 160. As shown in Figure 4, the system communications bus 160 also provides 
communications between the processing tool 105, the monitoring step 110, and the Advanced Process Control 
(APC) system 120, more fully described below. 

As shown in Figures, the woikpiecelOO progresses from tiie monitoring step 110 to the other 
processing steps 140. In the other processing steps 140, other processing may be performed on the woricpiece 100 

10 to produce the finished workpiece 100. In alternative illustrative embodiments, the woikpiece 100 sent from the 
monitoring step 110 may be the finished workpiece 100, in which case, there may not be other processing 
steps 140. As shown in Figure 5, the other processing steps 140 may communicate with the monitoring step 1 10 
via the system conmiunications bus 160. As shown in Figure 5, the system communications bus 160 also 
provides communications between the monitoring step 110, the other processing steps 140, and the Advanced 

15 Process Control (APC) system 120, more fully described below. 

As shown in Figure 6, monitored sensor data 1 15 is sent from die monitoring step 1 10 and delivered to 
the Advanced Process Control (APC) system 120. As shown in Figure 6, the Advanced Process Control (APC) 
system 120 may communicate wiA the monitoring step 110 via the system communications bus 160. Delivering 
the monitored sensor data 115 to the Advanced Process Control (APC) system 120 produces an output 

20 signal 125. 

As shown in Figure 7, the output signal 125 is sent from the Advanced Process Control (APC) 
system 120 and delivered to an adaptive sampling processing modeling with model predictive control (MPC) or 
proportional-integraMerivative (PID) tuning step 130. In the adaptive sampling processing modeling with model 
predictive control (MPC) or proportional-integral-dMivative (PID) tuning step 130, the monitored sensor 

25 data 1 15 may be used in an adaptive sampling processing model, appropriate for the processing performed on the 
workpiece 100 in the processing tool 105. In various alternative illustrative embodiments of the present 
invention, an adaptive sampling processing modeling step 130 may be provided without model predictive control 
(MPC) tuning or proportional-integral-derivative (PID) tuning. 

For example, such adaptive saraplmg processing models may provide a significant improvement in 

30 sampling methodology by treating sampling as an integrated part of the dynamic control environment of 
Advanced Process Control (APC) systems. Rather flian applying a static "optimum" sampling rate, sampling is 
treated as a dynamic variable that is increased or decreased based upon (1) situational information, such as the 
amount and/or rate of change in the variation in recent data, (2) events, such as maintenance and/or changes in 
the process upstream of the operation, and/or (3) requirements of closed-loop run-to-run controllers in their 

35 schemes to identify control model parameters. The use of the monitored sensor data 1 15 in an adaptive sampling 
processing model produces one or more processing recipe adjustments 145. 

In various illustrative embodiments, an adaptive sampling processing model may be built by various 
illustrative techniques, as described more fiilly below. Such an adaptive sampling processing model may also be 
formed by nionitormg one or more processing tool variables and/or one or more processing parameters during 

40 one or more processing runs. As described above, examples of such processing tool variables and/or processing 

-6- 
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parameters may comprise one or more pyrometer trace readings, one or more lamp power trace readings, one or 
more tube temperature trace readings, one or more current readings, one or more infrared (IR) signal readings, 
one or more optical emission spectrum readings, one or more process gas temperature readings, one or more 
process gas pressure readings, one or more process gas flow rate readings, one or more etdi depths, one or more 

5 process layer tbicknesses, one or more resistivity readings, and Hie like. In fliese various illustrative 
eiribodiments, building the adaptive sampling processing models may comprise fitting the collected processing 
data using at least one of polynomial curve fitting, least-squares fitting, polynomial least-squares fitting, 
non-polynomial least-squares fitting, weighted least-squares fitting, weighted polynomial least-squares fitting, 
weighted non-polynomial least-squares fitting. Partial Least Squares (PLS), and Prmcipal Components Analysis 

10 (PCA), as described more fully below. 

hi various illustrative embodiments, die adaptive sampling processing model may incorporate at least 
one model predictive control (MFC) controller, or at least one proportional-mtegral-derivative (PID) controller, 
having at least one tuning parameter, hi various of these illustrative embodiments, die adaptive sampling 
processing model, appropriate for processing, may incorporate at least one closed-loop model predictive control 

15 (MPC) controller, or at least one closed-loop proportional-integral-derivative (PID) controller, havmg at least one 
tuning parameter The model predictive control (MPC) controller or the proportional-integral-derivative (PBD) 
controller tuning parameter(s) may be optimized based on an objective function tiiat minimizes undesirable 
processing conditions in the processing perforaied on the workpiece 100 in the processing tool 105. 

An optimal control problem is to determine the set of mputs that extremize (mmunize or maxunize) an 

20 objective function while satisfymg the constraints of die system model and any additional process requiremoits. 

Matiiematically, this may be described by nnnf(x,u^t) subject to the constraint(s) that g;(x,i/,^)>0, 

« 

where x represents the system state variables (such as deviations from target values, uncertainty in parameter 
estimates, cost(s) of material(s) needed, and the like), u represents the alterable input(s), t represents the time, and 
i labels the constraint(s). These mathematical relations may appear to be very simple, but they are very general 

25 and are not limited to describing simple systems. The constraint equations may include differential equations 
and/or difference equations that govern the process(es) as well as the operating limits that are unposed on the 
process(es) input(s) and state(s). 

For most real processes, this problem results in a set of nonlinear differential equations with mixed 
boundary conditions. Optimal solutions have been derived for some simple process models. One class of such 

30 problems is linear (model), quadratic (objective function), Gaussian (noise) systems (LQG systems). For linear 
quadratic Gaussian (LQG) systems, an optimal controller may be derived. In general, for real processes, a 
sub-optimal controller may have to suffice, since the "true" model of the system is either unknown and/or too 
complicated to have an analytic solution. One approach is to assume the system is a linear quadratic Gaussian 
(LQG) system and to use the corresponding linear controller as an approximate solution. 

35 For example, a model predictive control (MPC) controller or a proportional-integral-derivative (PID) 

controller may be designed to generate an output that causes some corrective effort to be applied to the 
processing performed on the workpiece 100 m the processmg tool 105 to drive one or more measurable 
processing tool variable and/or one or more processing parameter toward a respective desired value known as the 
setpoint. The model predictive control (MPC) controller or the proportional-integral-derivative (PID) controller 
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may generate the output that causes the conective effort by monitoring and/or measuring and/or observing ttie 
error between the setpoint and a measurement of the respective processing tool variable(s) and/or processing 
parameter(s). 

For example, a proportional-integral-derivative (PID) controller may look at the current value of the 
eiTor e(f), the integral of the error e(0 over a recent time interval, and the current value of the derivative of the 
error e(<) with respect to time to detennine how much of a correction to apply and for how long. Multiplying each 
of fliese terms by a respective tuning constant and adding them tog^w gpierates the 
pixyportional-integral-derivative (PID) controUer current output CO(0 given by the expression 

COit) = P (e(0) + / ( Jc(0*) + D ^ e(0 j , where P is fte proportional tuning constant, / is the integral 

tuning constant, D is the derivative tuning omstant, and «ie error e(t) is the difference between ttie setpoint SPQ) 
and flie process variable Pm at time t, c(0 = SP(0-PKO- If *e current error e{t) is large and/or flie error e(0 has 
been large for a long time and/or the current error e(t) is changing rapidly, the current controller output COQ) 
may also be large. However, if the current error e(t) is small, the error e(t) has been small for a long time, and flie 
cuirent error e(0 is changing slowly, the current controller output CCKf) may also be small. 

In various alternative illustrative embodiments, flie proportional-integral-derivative (PID) controller 
current output CO(.t) may be givoi by Ae alternative expression 



CO(0 = P e(t)+jr(l<t)dt)-T,{^j^PV(t) 



, where P is an overall tuning constant, Ti is the integral 



time tuning constant, Tp is the derivative time tuning constant, and the error e(t) is the difference between the 
setpoint SP{t) and the process variable PF(0 at time t, e(t) = SPityPVit), In these alternative Ulustrative 
embodiments, there are fewer abrupt changes in the proportional-integral-derivative (PID) controller current 
output CCKf) when tfiere is a change to the setpoint SP(t\ due to the dependence on tiie time derivative of the 
process variable PV{(), rather than on the time derivative of the error e(t) = SP(tyPV(t). 

The proportional-integral-derivative (PID) controller current output CO(0 tuning constants P, /, and D, 
and/or P, Tj, and Td, may be tuned appropriately. Using aggressively large values for the tuning constants P, /, 
and A and/or P, Ti, and Tp, may amplify the error e{t) and overcompensate and overshoot the setpoint(s). Using 
conservatively small values for the tuning constants P, /, and A and/or P, Tj, and Tp, may reduce the enror e{t) 
too slowly and undercompensate and undershoot the setpoint(s). Appropriately tuned 
proportional-integral-derivative (PID) controller current output CO(t) tuning constants P, /, and D, and/or P, 7}, 
and To, may lie between these two extremes. The propprtional-integral-derivative (PID) controller cuirent output 
CO(0 tuning constants P, /, and D, and/or P, 7), and To, may be tuned appropriately using trial-and-error 
tweaking, using a more rigorous analytical approach involving mathematical modeling, as described more fiiUy 
below, and/or using techniques such as the Ziegler-Nichols "open loop" and "closed loop" tuning techniques. 

The adaptive sampling processing modeling of the monitored sensor data 115 in the adaptive sampling 
processing modeling with model predictive control (MPC) or proportional-integral-derivative (PID) tuning 
step 130, may be used to alert an engineer of the need to adjust the processing performed in any of a variety of 
processing steps, such as the processing tool 105 and/or the other processing steps 140. The engineer may also 
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alter and/or adjust, for example, the setpoints for the processing performed in the processing tool 105, and/or the 
processing tool variable(s) and/or processing parameter(s) monitored and/or measured in the monitoring step 1 10. 

As shown in Figure 8, a feedback control signal 135 may be sent firom the adaptive sampling processing 
modeling with model predictive control (MPC) or proportional-integtal-derivative (PID) tuning step 130 to the 

5 processing tool 105 to adjust the processing perfonned in Ae processing tool 105. In various alternative 
illustrative embodiments, the feedback control signal 135. may be sent from the adaptive sampling processing 
modeling with model predictive control (MPC) or proportional-integral-derivative (PID) tuning step 130 to any 
of the other processing steps 140 to adjust the processing performed in any of the other processing steps 140, for 
example, via the system communications bus 160 that provides conununications between the processing tool 105, 

10 the monitoring step 1 10, the other processing steps 140, and the Advanced Process Control (APC) system 120, 
more fully described below. 

As shown in Figure 9, in addition to, and/or instead of, the feedback control signal 135, the one or more 
processing recipe adjustments 145, and/or an entire appropriate recipe based upon this analysis, may be sent from 
the adaptive sampling processing modeling with model predictive control (MPC) or 

1 5 proportional-integral-derivative (PID) tuning step 130 to a processing process change and control step 1 50. In the 
processing process change and control step 150, the one or more processing recipe adjustments 145 may be used 
in a high-level supervisory control loop. Thereafter, as shoym in Figure 10, a feedback control signal 155 may be 
sent from the processing process change and control step 150 to the processing tool 105 to adjust the processing 
perfonned in the processing tool 105. In various alternative illustrative embodiments, the feedback control 

20 signal 155 may be sent from the processing process change and control step 150 to any of the other processing 
steps 140 to adjust the processing performed in any of the other processing steps 140, for example, via the system 
communications bus 160 that provides communications between the processing tool 105, the monitoring 
step 110, the other processing steps 140, and the Advanced Process Control (APC) system 120, more fully 
described below. 

25 In various illustrative embodiments, the engineer may be provided with advanced process data 

monitoring capabilities, such as the ability to provide historical parametric data in a user-friendly format, as well 

- 1^ 

as event logging, real-time graphical display of both current processing parameters and the processing parameters 
of the entire run, and remote, i.e., local site and worlclwide, monitoring. These capabilities may engender more 
optimal control of critical processing parameters, such;* as throughput accuracy, stability and repeatability, 

30 processing temperatures, mechanical tool parameters, aiid|. the like. This more optimal control of critical 
processing parameters reduces this variability. This reduction in variability manifests itself as fewer within-run 
disparities, fewer run-to-run disparities and fewer toolrto-tool disparities. This reduction in the number of these 
disparities that can propagate means fewer deviations in product quality and performance. In such an illustrative 
embodiment of a method of manufacturing according to the present invention, a monitoring and diagnostics 

35 system may be provided that monitors this variability and optimizes control of critical parameters. 

Figure 11 illustrates one particular embodiment of a method 1100 practiced in accordance with the 
present invention. Figure 12 illustrates one particular apparatus 1200 with which the method 1100 may be 
practiced. For the sake of clarity, and to further' an* understanding of the invention, the method 1100 shall be 
disclosed in the context of the apparatus 1200.' However, the invention is not so limited and admits wide 

40 variation, as is discussed further below. 
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Referring now to both Figures 11 and 12, a batch or lot of woikpieces or wafers 1205 is being processed 
through a processmg tool 1210. The processing tool 1210 may be any processmg tool known to the art, 
particularly if it comprises ttie requisite control capabilities. The processing tool 1210 comprises a processing 
tool controller 1215 for this control purpose. The nature and function of the processing tool controller 1215 will 
be implementation specific. 

For instance, the processing tool controller 1215 may control processing control input parameters such 
as processing recipe control input parameters and/or setpoints. Four woricpieces 1205 are shown in Figure 12, but 
the lot of woricpieces or wafers, i.e., the "wafer lot," may be any practicable number of wafers ftom one to any 
finite number. 

The method 1 100 begins, as set forth in box 1 120, by sampling one or more parameters characteristic of 
tfie processing performed on the workpiece 1205 in the processing tool 1210. The nature, identity, and 
measurement of characteristic parameters will be largely implementation specific and even tool specific. For 
instance, capabilities for monitoring process parameters vary, to some degree, from tool to tool. Greater sensing 
capabilities may permit wider, latitude in the characteristic parameters that are identified and measured and the 
manner in which diis is done. Conversely, lesser sensing capabilities may restrict this latitude. In turn, tiie 
processing control input parameters such as the processing recipe control input parametCTS and/or the se^ints 
for workpiece temperature and/or lamp power and/or anneal tune and/or process gas temperature and/or process 
gas pressure and/or process gas flow rate and/or radio frequency (RF) power and/or etch time and/or bias voltage 
and/or deposition time, and tfie like, may directly affect the effective yield ^f usable semiconductor devices from 
the workpiece 1205. 

Turning to Figure 12, in this particular embodiment, the processing process characteristic parameters are 
measured and/or monitored by tool sensors (not shown). The outputs of these tool sensors are transmitted to a 
computer system 1230 over a line 1220. The computer system 1230 analyzes Itoese sensor outputs to identify the 
characteristic parameters. 

Returning, to Figure 11, once flie characteristic parameter is identified and measured, the method 1100 
proceeds by modeling the measured and identified characteristic parameter(s) using an adaptive sampling 
processing model (as described more fully below), as set forth in box 1130. The computer system 1230 in 
Figure 12 is, in this particular embodiment, programmed to model the characteristic parameter(s). The manner in 
which this modeling occurs will be implementation specific. 

In the embodiment of Figure 12, a database 1235 stores a plurality of models that might potentially be 
applied, depending upon which characteristic parameter is measured. This particular embodiment, therefore, 
requires some a priori knowledge of the characteristic parameters that might be measured. The computer 
system 1230 then extracts an appropriate model from the database 1235 of potential models to apply to the 
measured characteristic parameters. If the database 1235 does not include an appropriate model, then the 
characteristic parameter may be ignored, or the computer system 1230 may attempt to develop one, if so 
programmed. Hie database 1235 may be stored on any kind of computer-readable, program storage medium, 
such as an optical disk 1240, a floppy disk 1245, or a hard disk drive (not shown) of the computer system 1230. 
The database 1235 may also be stored on a separate computer system (not shown) that interfaces with the 
computer system 1230. 
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Modeling of the measured characteristic parameter may be implemented differently in alternative 
embodiments. For instance, the computer system 1230 may be programmed using some fonn of artificial 
intelligence to analyze tiie sensor outputs and controller inputs to develop a model on-the-fly in a real-time 
implementation. This approach might be a useful adjunct to the mbodiment illustrated in Figure 12, and 
5 discussed above, where characteristic parameters are measured and identified for which the database 1235 has no 
Expropriate model 

The method 1 100 of Figure 1 1 then proceeds by applying the model to modify at least one processing 
control input parameter, as set forth in box 1140. Depending on the implementation, applying the model may 
yield either a new value for a processing control input parameter or a correction to an existing processing control 

10 input parameter. In various illustrative embodiments, a multiplicity of control input recipes may be stored and an 
appropriate one of these may be selected based upon one or more of the determined parameters. The new 
processing control input is then formulated from the value yielded by the model and is transmitted to the 
processing tool controller 1215 over the line 1220. The processing tool controller 1215 then controls subsequent 
processing process operations in accordance with the new processing control inputs. 

15 Some alternative embodiments may employ a form of feedback to improve the modeling of 

characteristic parameters. The implementation of this feedback is dependent on several disparate facts, including 
the tool's sensing capabilities and economics. One technique for doing this would be to monitor at least one 
effect of the model's implementation and update the model based on the effect(s) monitored. The update may 
also depend on the model. For instance, a linear model may require a different update than would a non-linear 

20 model, all other factors being the same. 

As is evident from the discussion above, some features of the present invention may be implemented in 
software. For instance, the acts set forth in the boxes 1120-1140 in Figure 11 are, in the illustrated embodiment, 
software-implemented, in whole or in part Thus, some features of the present invention are implemented as 
instructions encoded on a computer-readable, program storage medium. The program storage medium may be of 

25 any type suitable to the particular implementation. However, the program storage medium will typically be 
magnetic, such as the floppy disk 1245 or the computer 1230 hard disk drive (not shown), or optical, such as the 
optical disk 1240. When these instructions are executed by a computer, they perform the disclosed functions. The 
computer may be a desktop computer, such as the computer 1230. However, the computer might alternatively be 
a processor embedded in the processing tool 1210. The computer might also be a laptop, a workstation, or a 

30 mainirame in various other embodiments. The scope of the invention is not limited by the type or nature of the 
program storage medium or computer with which embodiments of the invention might be implemented. 

Thus, some portions of the detailed descriptions herein are, or may be, presented in terms of algorithms, 
functions, techniques, and/or processes. These terms enable those skilled in the art most effectively to convey the 
substance of their work to others skilled in the art. These terms are here, and are generally, conceived to be a 

35 self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations 
of physical quantities. Usually, though not necessarily, these quantities take the form of electromagnetic signals 
capable of being stored, transferred, combined, compared, and otherwise manipulated. 

It has proven convenient at times, principally for reasons of common usage, to refer to these signals as 
bits, values, elements, symbols, characters, temis, numbers, and the like. All of these and similar terms are to be 

40 associated with the appropriate physical quantities and are merely convenient labels applied to these quantities 
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and actions. Unless specifically stated otherwise, or as may be apparent from the discussion, terms such as 
"processing," ^'computing," "calculating," "determining," "displaying," and the like, used herein refer to the 
action(s) and processes of a computer system, or similar electronic and/or mechanical computing device, that 
manipulates and transforms data, represented as physical (electromagnetic) quantities within the computer 
system's registers and/or memories, into other data similarly represented as physical quantities within the 
computer system's memories and/or registers and/or other such information storage, transmission and/or display 
devices. 

Constniction of an Illustrative Apparatus. An exemplary embodiment 1300 of the apparatus 1200 in 
Figure 12 is illustrated in Figures 13-14, in which the apparatus 1300 comprises a portion of an Advanced 
Process Control C'APC") system. Figures 13-14 are conceptualized, structural and fimctional block diagrams, 
respectively, of the apparatus 1300. A set of processing steps is perfonned on a lot of workpieces 1305 on a 
processing too! 1310. Because the apparatus 1300 is part of an Advanced Process Control (APC) system, the 
workpieces 1305 are processed on a run-to-nm basis. Thus, process adjustments are made and held constant for 
the duration of a run, based on run-level measurements or averages. A "run" may be a lot, a batch of lots, or even 
I an individual wafer. 

In this particular embodiment, the workpieces 1305 are processed by the processing tool 1310 and 
various operations in the process are controlled by a plurality of processing control input signals on a Ime 1320 
between the processing tool 1310 and a workstation 1330. Exemplary processing control inputs for this 
embodiment might include those for the setpomts for workpiece temperatto-e, lamp power, anneal time, process 
) gas temperature, process gas pressure, process gas flow rate, radio frequency (RF) power, etch time, bias voltage, 
deposition time, and the like. 

When a process step in the processing tool 1310 is concluded, the semiconductor workpieces 1305 being 
processed in the processing tool 1310 are examined at a review station 1317. The review station 13 17 need not be 
part of the processing tool 1310, but may, for example, be a separate tool and/or station. The processing control 
5 inputs generally affect the characteristic parameters of the semiconductor workpieces 1305 measured at the 
review station 1317, and, hence, the variability and properties of the acts performed by the processing tool 1310 
on the workpieces 1305. Once errors are determined from the examination after the run of a lot of 
workpieces 1305, the processing control inputs on the line 1320 are modified for a subsequent run of a lot of 
workpieces 1305. Modifying the control signals on the line 1320 is designed to improve the next processing 
30 perfomied by the processing tool 1310. The modification is performed in accordance with one particular 
embodiment of the method 1100 set forth in Figure 11, as described more fully below. Once the relevant 
processing control input signals for the processing tool 1310 are updated, the processing control input signals 
with new settings are used for a subsequent run of semiconductor devices. 

Referring now to both Figures 13 and 14, the processing tool 1310 communicates with a manufacturing 
35 framework comprising a networic of processing modules. One such module is an Advanced Process Control 
(APC) system manager 1440 resident on the computer 1340. This network of processing modules constitutes the 
Advanced Process Control (APC) system. The processing tool 1310 generally comprises an equipment 
interface 1410 and a sensor interface 1415. A machine interface 1430 resides on the workstation 1330. The 
machine interface 1430 bridges the gap between the Advanced Process Control (APC) framework, e.g., the 
40 Advanced Process Control (APC) system manager 1440, and the equipment interface 1410. Thus, the machine 
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interface 1430 interfaces the processing tool 1310 with the Advanced Process Control (APC) framework and 
supports machine setup, activation, monitoring, and data collection. The sensor interface 1415 provides fee 
appropriate interfece environment to communicate with external sensors such as LabView® or other sensor 
bus-based data acquisition software. Both the machine interfece 1430 and the sensor interface 1415 use a set of 
5 functionalities (such as a communication standard) to collect data to be used. The equipment inter&ce 1410 and 
die sensor interface 1415 communicate over the line 1320 with the machine interfece 1430 resident on the 
workstation 1330. 

More particularly, the machine interface 1430 receives commands, status events, and collected data from 
the equipment interfece 1410 and forwards these as needed to other Advanced Process Control (APC) 
10 components and event channels. In turn, responses from Advanced Process Control (APC) components are 
received by the machine interface 1430 and rerouted to the equipment interfoce 1410. The machine 
' interface 1430 also reformats and restructures messages and data as necessary. The machine int^^ce 1430 
supports the startup/shutdown procedures within the Advanced Process Control (APC) System Manager 1440. It 
also serves as an Advanced Process Control (APC) data collector, buffering data collected by the equipment 
1 5 interfece 1410, and emitting impropriate data collection signals. 

In die particular embodiment illustrated, the Advanced Process Control (APC) system is a factory-wide 
software system, but this is not necessary to the practice of the invention. The control strategies taught by the 
present invention can be apphed to virtually any semiconductor processing tool on a factory floor. Indeed, the 
present invention may be simultaneously employed on multiple processing tools in the same factory or in the 
20 same fabrication process. The Advanced Process Control (APC) framework permits remote access and 
monitoring of the process performance. Furthennore, by utilizing the Advanced Process Control (APC) 
framework, data storage can be more convenient, more flexible, and less expensive than data storage on local 
drives. However, the present invention may be employed, in some alternative embodiments, on local drives. 

The illustrated embodiment deploys the present invention onto the Advanced Process Control (APC) 
25 framework utilizing a number of software components. In addition to components within the Advanced Process 
Control (APC) framework, a computer script is written for each of the semiconductor processing tools involved 
in the control system. When a semiconductor processing tool in the control system is started in the semiconductor 
manufacturing fab, the semiconductor processing tool generally calls upon a script to initiate the action that is 
required by the processing tool controller. Hie ccuitrol metibods are generally defined and performed using these 
30 scripts. The development of these scripts can cbmprise a significant portion of the development of a control 
system. ^ . 

In this particular embodiment, there are several separate software scripts that perform the tasks involved 
in controlling the processing operation. There is one script for the processing tool 1310, including die review 
station 1317 and the processing tool controller 13 15:, There is also a script to handle the actual data capture from 
35 the review station 1317 and another script that contains common procedures that can be referenced by any of the 
other scripts. There is also a script for the Advanced Process Control (APC) system manager 1440. The precise 
number of scripts, however, is implementation :Specific and. alternative embodiments may use other numbers of 
scripts. C 

Operation of an Illustrative Apparatus.^ Figure 15 illustrates one particular embodiment 1500 of the 
40 method 1100 in Figure 11. The method 1500, .may be practiced with the apparatus 1300 illustrated in 

'-13-. 
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Figures 13-14. but the invention is not so limited. The method 1500 may be practiced with any apparatus that 
may perform the fimctions set forth in Figure 15. Furthermore, the method 1100 in Figure 1 1 may be practiced in 
embodiments alternative to the method 1500 in Figure 15. 

Referring now to all of Figures 13-15, the method 1500 begins with processing a lot of workpieces 1305 
through a processmg tool, such as the processing tool 1310. as set forth in box 1510. In this particular 
embodiment, the processing tool 1310 has been initialized for processing by the Advanced Process Control 
(APC) system manager 1440 through the machine interfece 1430 and fee equipment interfeice 1410. In this 
particular embodhnent, before the processing tool 1310 is run. the Advanced Process Control (APC) system 
manager script is called to initialize the processing tool 1310. At this step, the script records the identification 
number of the processing tool 13.10 and the lot number of Ae workpieces 1305. The identification number is then 
stored against the lot number m a data store 1360. The rest of the script, such as the APCData call and the Setup 
and StarlMachme calk, are formulated with blank or dummy data in order to force the machine to use defeult 
settings. 

As part of this initialization, the initial setpoints for processing control are provided to ttie i«ocessing 
15 tool controller 1315 over the Ime 1320. Ihese initial setpoints may be detBrmined and implemented m any 
suitable manner known to the art In this case, one or more wafer lots have been processed through substantially 
the same or snniter contexts or conditions as the current wafer lot, and have also been measured for processing 
erroi<s) usmg the review station 1317. When this inforaiation exists, state estunates gleaned from the measured 
enor(s) and/or bias(es) are retrieved from the data store 1360. These ptocessmg control input signal settings 
20 computed from the state estimates are then downloaded to the processmg tool 1310. 

The woricpieces 1305 are processed through the processing tool 1310. This comprises, in the 
embodiment illustrated, subjecting the workpieces 1305 to a rapid thermal anneal. The workpieces 1305 are 
measured on the review station 1317 after their processmg on the processing tool 1310. The review station 1317 
examines the worlq)ieces 1305 after fliey ate processed for a number of otots, such as deviations from target 
25 values, such as fflm thicknesses, etch depths, and the like. The data generated by the instruments of the review 
station 1317 is passed to the machine mterfece 1430 via sensor mterface 1415 and the line 1320. The review 
station script begins with a number of Advanced Process Control (APC) commands for the collection of data. 
The review station script then locks itself in place and activates a data available script. This script facilitates the 
actual transfer of the data from the review station 1317 to the Advanced Process Control (APC) framework. Once 
30 the transfer is completed, the script exits and unlocks the review station script. The interaction wifli the review 
station 1317 is then generally complete. 

As will be appreciated by those skilled m the art having the benefit of fliis disclosure, the data generated 
by the review station 1317 should be preprocessed for use. Review stations, such as KLA review stations, 
provide the control algorithms for measuring the control Gxrar. Each of Ae error measurements, in Has particular 
35 embodiment, corresponds to one of the processing control ii^jut signals on the Une 1320 in a dhect manner. 
Before the error can be utilized to correct Ae processing control input signal, a certain amount of preiMwessing is 
generally completed. 

For example, preprocessing may include outlier rejection. Outlier rejection is a gross error check 
ensuring that the received data is reasonable in light of the historical performance of the process. This procedure 
40 involves comparing each of the processing errors to its corresponding predetermined boundary parameter. In one 
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embodiment, even if one of the predetennined boundaries is exceeded, the error data pom the entire 
semiconductor wafer lot is generally rejected. 

To determme the limits of the outlier rejection, thousands of actual semiconductor manufacturing 
fabrication ("fab") data points may be collected. The standard deviation for each error parameter in this collection 
of data is then calculated. In one embodiment, for outlier rejection, nine times the standard deviation (both 
positive and negative) is generally chosen as the predetermined boundary. This was done primarily to ensure tfiat 
only fee points that are significantly outside the noraial operating conditions of the process are rejected. 

Preprocessing may also smooth the data, which is also known as filtering. Filtering is important because 
the error measurements are subject to a certain amount of randomness, such that the error significantly deviates in 
value. Filtering the review station data results in a more accurate assessment of the error in tiie processing control 
input signal settings. In one embodiment, the processing control scheme utilizes a filtering procedure known as 
an Exponentially-Weighted Moving Average fEWMA") filter, altiiough other filtering procedures can be 
utilized in this context 

One embodiment fw the EWMA filter is represented by Equation (1): 

AVON = W * MC + (1-W) * AVGP (1) 

where 

AVGN = the new EWMA average; 

W = a weight for the new average (AVGN); 

MC = the current measurement; and ; 

AVGP H the previous EWMA average. 

The weight is an adjustable parameter diat can be used to control the amount of filtering and is generally 
between zero and one. The weight represents the confidence in fee accuracy of fee current data point If fee 
measurement is considered accurate, fee weight should be close to one. If feere were a significant amount of 
fluctuations in fee process, feen a number closer to zero would be appropriate. 

In one embodiment, feere are at least two techniques for utilizmg the EWMA filtering process. The first 
technique uses the previous average, fee weight, and fee current measurement as described above. Among fee 
advantages of utilizing the first implementation are ease of use and minimal data storage. One of fee 
disadvantages of utilizing the fu^t implementation is that this method generally does not retain much process 
information. Furthermore, the previous average calculated in feis manner would be made up of every data point 
that preceded it, which may be undesffable. The second technique retains only some of fee data and calculates the 
average from fee raw data each time. 

The manufacturing environment in the semiconductor manufacturing fab presents some unique 
challenges. The order that fee semiconductor wafer lots are processed ferough an processing tool may not 
correspond to the order in which feey are read on fee review station. This could lead to fee data points being 
added to fee EWMA average out of sequence. Semiconductor wafer lots may be analyzed more fean once to 
verify fee error measurements. Wife no data retention, bofe readings would contribute to fee EWMA average, 
which may be an undesirable characteristic. Furfeermore, some of fee control threads may have low volume, 
which may cause the previous average to be outdated such feat it may not be able to accurately represent fee error 
in fee processing control input signal settings. 
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nie processing tool controller 1315, in this particular embodiment, uses limited storage of data to. 
calculate the EWMA fdtered eitor. i.e.. the first technique. Wafer lot data, including the lot number, the time the 
lot was processed, and the multiple error estimates, are stored in the data store 1360 under the control thread 
name. When anew set of data is collected, the stack of data is retrieved from data store 1360 and analyzed. The 
5 lot number of the current lot being processed is compared to those in the stack. If the lot number maU*es any of 
the data present there, the error measurements are replaced. Otherwise, the data point is added to the current stack 
in chronological order, according to the time periods when the lots were processed. In one embodiment, any data 
point within the stack that is over 128 hours old is removed. Once the aforementioned steps are complete, the 
new-filter avaage is calculated and stMBd to data store 1360. 
10 Thus, the data is collected and preprocessed. and then processed to generate an estimate of the current 

errors in the processing control input signal settings. First, the data is passed to a compiled Matlab® plug-in that 
performs the outlier rejection criteria described above. The inputs to a plug-in interfece are the multiple error 
measurements and an array containing boundary values. The return from the plug-in interface is a single toggle 
variable. A nonzero return denotes that it has failed the rejection criteria, otherwise the variable returns the 
1 5 defauh value of zero and the script continues to process. 

After the outlier rejection is completed, the data is passed to the EWMA filtering procedure. The 
controUer data for the control thread name associated with the lot is retrieved, and aU of the relevant operation 
upon the stack of lot data is carried out This comprises replacing redmidant data or removing older data. Once 
the data stack is adequately prepared, it is parsed into ascending time-oideied arrays that coirespond to the enor 
20 values. These arrays are fed into the EWMA plug-in along with an amy of the parameter required for its 
execution, to one embodiment, the return from the plug-in is comprised of the six fUtered error values. 

Returning to Figure 15, data preprocessing comprises monitoring and/or sampling woriq)iece 1305 
parameter(s) characteristic of the processing tool 1310 variables, as set forth in box 1520. Known, potential 
characteristic parameters may be identified by characteristic data patterns or may be identified as known 
25 consequences of modifications to processing control. In turn, the processing control mput parameters such as the 
processing recipe control input parameters and/or the setpoints for woricpiece temperature and/or lamp power 
and/or anneal time and/or process gas temperature and/or process gas pressure and/or process gas flow rate and/or 
radio frequency (RF) power and/or etch time and/or bias voltage and/or deposition time, and the like, may 
directly affect the effective yield of usable semiconductor devices from the workpiece 1205. 
30 The next step in the control process is to calculate the new settings for the processing tool 

controller 1315 of the processing tool 1310. The previous settings for the control thread corre^onding to the 
current wafer lot are retrieved fiom the data store 1360. This data is paired along with the current set of 
processing errors. The new settings are calculated by calling a compfled Matlab® plug-in. This application 
incorporates a number of inputs, performs calculations in a separate execution component, and returns a number 
35 of ouQ>uts to fte main script. Generally, the inputs of flie Matlab® plug-in are fte processing control input signal 
settings, the review station 1317 errors, an anay of parameters that are necessary for the control algorithm, and a 
currently unused flag error. The outputs of the Matlab® plug-in are flie new controller settings, calculated in the 
plug-in according to the controller algorithm described above. 

A processing process engineer or a control engineer, who generally determines the actual form and 
40 extent of the control action, can set the parameters. They include the threshold values, maximum step sizes. 
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controller weights, and target values. Once the new parameter settings are calculated, the script stores the setting 
in the data store 1360 such that the processing tool 1310 can retrieve them for the next wafer lot to be processed. 
The principles taught by the present invention ^can be implemented into other types of manufacturing 
frameworks. 

5 Returning again to Figure 15, the calculation of new settings comprises, as set forth in box 1530, 

modeling the characteristic parameter(s) usmg an adaptive sampling processing model. This modeling may be 
performed by the Matlab® plug-in. In this particular embodiment, only known, potential characteristic parameters 
are modeled and the models are stored in a databasis 1335 accessed by a machine interface 1430. The 
database 1335 may reside on the workstation 1330, as shown, or some other part of the Advanced Process 

10 Control (APC) framework. For instance, the models might be stored in tiie data store 1360 managed by the 
Advanced Process Control (APC) system manager 1440 in alternative embodiments. TTie model will generally be 
a mathematical model, i.e., an equation describmg how &e change(s) in processing recipe control(s) affects the 
processing performance, and the like. The models described m various illustrative embodunents given above, and 
described more frilly below, are examples of such models. 

15 The particular model used will be implementation specific, depending upon the particular processing 

tool 1310 and tiie particular characteristic parameter(s) being modeled. Whether the relationship in the model is 
Imear or non-linear will be dependent on the particular parameter(s) involved. 

Hie new settings are then transmitted to and applied by the processing tool controller 1315. Thus, 
returning now to Figure 15, once the characteristic parameter(s) are modeled, the model is applied to modify at 

20 least one processing recipe control input parameter using at least one model predictive control (MPC) controller 
or at least one proportional-integral-derivative (PID) controller, described more frilly above, as set forth in 
box 1540. In this particular embodiment, the machine interface 1430 retrieves the model torn the database 1335, 
plugs in the respective value(s), and determines the necessary change(s) m the processmg recipe control input 
parameter(s). The change is then communicated by the machine interface 1430 to the equipment interfece 1410 

25 over the line 1320. The equipment interface 1410 then implements the change. 

The present embodiment ftirthermore provides that the models be updated. This comprises, as set forth 
in boxes 1550-1560 of Figure 15, monitoring at least one effect of modifymg the processing recipe control input 
parameters (box 1550) and updating the applied model (box 1560) based on the effect(s) monitored. For instance, 
various aspects of the operation of the processing tool 1310 will change as the processing tool 1310 ages. By 

30 monitoring the effect of the processing recipe cHange(s) Implemented as a result of the characteristic parameter 
measurement, the necessary value could be updated to yi^ld superior performance. 

As noted above, this particular embodiment unplements an Advanced Process Control (APC) system. 
Thus, changes are implemented "between" lots. Tlie actions set forth in the boxes 1520-1560 are implemented 
after the current lot is processed and before the second lot is processed, as set forth in box 1570 of Figure 15. 

35 However, the invention is not so limited. Furthermore, .as noted above, a lot may constitute any practicable 
number of wafers from one to several thousand '(or practically any fmite number). What constitutes a "lof is 
implementation specific, and so the point of the fabrication process in which the updates occur will vary from 
implementation to implementation. ■ - ' 

As described above, in various illustrative embocliments of the present invention, an adaptive sampling 

40 processing model may be applied to modify processing performed in a processing step. For example, an adaptive 

.; > -Ur 
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sampling processing model may be formed by monitoring one or more tool variables and/or one or more 
processing parameters during one or more processing runs. Examples of such tool variables and/or processing 
parameters may comprise one or more pyrometer trace readings, one or more lamp power trace readings, one or 
more tube temperature trace readings, one or more current readings, one or more infrared (IR) signal readings, 
one or more optical emission spectrum readings, one or more process gas temperature readings, one or more 
process gas pressure readings, one or more process g?is flow rate readings, one or more etch depths, one or more 
process layer thicknesses, one or more resistivity readings, and tiie like. 

In mattiematical tenns, a set of m processing runs being measured and/or monitored over n processing 
tool variables and/or processing parameters may be azranged as a rectangular nxm matrix X. In o±er words, the 
rectangular nxm matrix X may be comprised of 1 to n rows (each row corresponding to a separate processing tool 
variable or processing parameter) and 1 to m colunms (each colunm corresponding to a separate processing run). 
The values of the rectangular nxm matrbc X may be the actually measured values of the processing tool variables 
and/or processing parameters, or ratios of actually measured values (normalized to respective reference 
setpoints), or logarithms of such ratios, for example. The rectangular nxm matrix X may have rank r, where 
r<min{m,n} is the maximum number of independent variables in the matrix X. The rectangular nxm matrix X 
may be analyzed usmg Principle Components Analysis (PCA), for example. The use of PCA, for example, 
generates a set of Principal Components P (whose "Loatogs," or components, represent the contributions of the 
various processing tool variables and/or processing parameters) as an eigenmatrix (a matrix whose columns are 
eigenvectors) of the equation ((X-M)(X-M)^)P=A^P, where M is a rectangular nxm matrix of the mean values of 
the columns of X (the m columns of M are each the colunm mean vector iXjui of X^^^, A^ is an nxn diagonal 
matrbc of the squares of the eigenvalues X^, i=l,2,...,r, of &e mean-scaled matrix X-M, and a Scores matrbc, T, 
with X.M=PT"^ and pC-M)"^=(PTy=(W=TP\ so that ((X.M)(X-M)^)P=((PT'^)(TP''))P and 
((PT^)(TP'^))P==(P(T'^T)P'^)P=P(T''^T)=A-P. The rectangular nxm matrix X, also denoted X„>a„, may have elements 
Xij, where i=l,2,...,n, and j=l,2,..„m, and the rectangular mxn matrix X\ the transpose of the rectangular nxm 
matrix X, also denoted CX'^)„xn, may have elements Xji, where i=l,2,...,n, and j=l,2,...,m. The nxn matrix 
(X-M)(X-M)'^ is (m-1) times the covariance matrix S„xii, having elements Sy, where i=l,2,...,n, and j=l,2,...,n, 

defined so that: s .. = — — — — , corresponding to the rectangular nxm matrix X„xm. 

m(m - 1) 

Although other methods may exist, four methods for computing Principal Components are as follows: 

1 . eigenanalysis (EIG); 

2- singular value decomposition (SVD); 

3. nonlinear iterative partial least squares (NIPALS); and 

4. power method. 

Each of the first two methods, EIG and SVD, simultaneously calculates all possible Principal 
Components, whereas the NIPALS method allows for calculation of one Principal Component at a time. 
However, the power method, described more fully below, is an iterative approach to fmding eigenvalues and 
eigenvectors, and also allows for calculation, of one Principal Component at a time. There are as many Principal 
Components as there are channels (or variable values). The power method may efficiently use computing time. 
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For example, consider the 3x2 matrix A, its transpose, the 2x3 matrix A"^. their 2x2 matrix product A'^A, 
and their 3x3 matrix product AA^: 






n 
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0^ 




P ' 




1 


1 


1 . 


1 
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EIG reveals that tiie eigenvalues "k of the matrix product A^A are 3 and 2. The eigenvectors of the matrix 
product A^A are solutions t of the equation (A^A)t = Xt, and may be seen Iqf inspection to be ti^ = (1,0) and 
- (0,1), belon^g to the e^envalues Xt >= 3 and = 2, respectively. 

The power method, for ecample, may be used to detomine the eigenvalues X and eigenvectors s of die 
matrix product AA^ where the eigenvalues X and the eigenvectors E are solutions b of the equation (AA^)b = X^. 
A trial eigenvector = (1,1,1) may be used: 

(1 1 oYi^i {^X [\\ 

= 3 1 



1 1 1 
0 1 2 



3 



1 



This indicates that the trial eigenvector 2^ = (1,1,1) happened to correspond to the eigenvector 
2i^ = (1,1,1) belonging to the eigenvalue Xi = 3. The power method then proceeds by subtracting the outer 
product matrix pipi^ from the matrix product AA^ to form a residual matrix Rj: 
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Another trial eigenvector = (1,0,-1) may be used: 



iAA'-p^£^)p = R,p = 



1 0-1 
0 0 0 
-1 0 1 



0 

-1 



^2^ 
0 

-2 



/ 1 ^ 



= 2 



£-2 



This indicates that the trial eigenvector e'^ = (1,0,-1) happened to correspond to the eigenvector 
E2^ = (1,0,-1) belonging to the eigenvalue X2 = 2. The power method then proceeds by subtracting the outer 
product matrix p^pi ^ fix>m the residual matrix R| to form a second residual matrix Rj: 
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The fact that the second residual matrix Rj vanities indicates lhat fte eigenvalue = 0 and fliat the 
eigenvector gj is completely arbitrary. The eigenvector Ej may be conveniently chosen to be orthogonal to the 
eigenvector = (1,1,1) and Bz = (1.0,-lX so that the eigenvector = (1,-2,1). Indeed, one may readUy verify 
diat: 



(2 1 0\ 



C3 



1 1 1 

0 1 2 



•A 



1 ^ 
-2 
1 



^0^ 
0 



= 0 



rn 

-2 



Similarly, SVD of A diows that A = Pf, where P is the Princqral Component matrix and T is flie 
Scores matrix: 



A = 



'/s Xr2 Xre 

Xj2 Xfs, 




Xs rs 

^/s Xj2 /s. 



0 -li 



SVD confirms Aat the singular values of A are V3 and V2, the positive square roots of flie eigenvalues 
Jli = 3 and = 2 of tiie matrix produa A^A. Note that fte columns of fte Principal Component matrix P are tiie 
ortiionormalized eigenvectors of tiie matrix product AA""^. 

Likewise, SVD of A^ shows that A^ = TP^: 



^1 oY-n/s 0 o' 

^0 V2 0^ 




[Ys Ys Ys 
Yn I -Yr. 

Xj6 XfS Xl6, 

Ys /v3 



-2 



16 /.V6>l 

SVD confirms that the (non-zero) singular values of A*^ are and ^, the positive square roots of the 
eigenvalues Xi = 3 and = 2 of the matrix product AA*^. Note that the columns of the Principal Component 
matrix P (the rows of the Principal Component matrix P^) are the orthononnalized eigenvectors of the matrix 
product AA**". Also note that the non-zero elements of the Scores matrix T are the positive square roots V3 and ^2 
of the (non-zero) eigenvalues A-i = 3 and A.2 = 2 of both of the matrix products A'^A and AA*^. 

Taking another example, consider the 4x3 matrix B, its transpose, the 3x4 matrix B^, their 3x3 matrix 
product B^B, and their 4x4 matrix product BB^: 
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B = 



) - 



111 1 
10 0 -1 
0 1-10 



(\ 1 0^ 

1 0 1 

1 0 -1 

1 -1 0 



(\ 1 0^ 
1 0 1 
1 0 -1 

a 1 1 n 

10 0-1 
0 1-10 

n 1 0^ 

10.1 
1 0 -1 

^11 1 1^ 

10 0-1 
,0 1 -1 oj 



r4 0 0"^ 

0 2 0 
0 0 2, 

^2 1 1 O^i 
12 0 1 
10 2 1 
0 112 



EIG reveals that the eigaivalues of tfie matrix product B^B are 4, 2 and 2. Hie eigenvectors of flie 
matrix product B''^B are solutions t of the equation (B'^B)J = ^ and may be seen by inq)ection to be Ji""" = (1,0,0), 
\^ = (0,1,0), and t,'' = (0,0,1), belonging to the eigaivalues X, = 4, = 2, md Xj = 2, respectively. 

The power method, for example, may be used to determine the eigenvalues X and eigenvectors s of the 
matrix product BB\ where the eigenvalues X and Ae eigenvectors e are solutions g of die equation (BB^)b = A^. 
A trial eigenvector^^ = (1,1,1,1) may be used: 
(1 1 1 OYl^ 



(55^)p = 
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12 0 11 
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.0 1 I 2X1^ _ 

This indicates that the trial eigenvector E^='(l, 1,1,1) happened to correspond to the eigenvector 
= (1,1,1,1) belonging to the eigenvalue X<j =4, The power method then proceeds by subtracting the outer 
product matrix £1^/ from the matrix product BB7 to form a residual matrix Rp 
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Another trial eigenvector 2^ = ( 1 f^S^-r I ) may be used: 



1 .0 1 0 -lY n 

0 i^^-i 0 

0 . ; 1 0 

l-i 0 : 0 1 A 
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Hiis indicates fliat flie trial eigenvector B^ = (1,0,0,-1) happaied to correspond to fte eigenvector 
E2^ = (1,0,0,-1) belonging to the eigenvalue X2=2. The power mefliod Aen proceeds by subtracting the outer 
product matrix EtEi^ fi«™ *e residual matrix Rj to form a second residual matrix R2: 
ft A _iN 1 0 0 -1^ To 0 0 0> 
0 0 0 0 
0 0 0 0 
-10 0 1 



10 0-1 
0 1-10 
0-110 
i^-1 0 0 1 
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-1 

0 



-1 
1 

0 



Another trial eigenvector e = (0,1,-1,0) may be used: 



0 
1 



0 oY 
-1 0 
-1 1 0 
0 0 oj 

This indicates that the trial eigenvector £^ = (0,1,-1,0) happened to correspond to the eigenvector 
^'^ = (0,1,-1 fi) belonging to the eigenvalue %3 = 2. The power method then proceeds by subtracting fee outer 
product matrbc gaEs^ from the second residual matrix R2 to form a third residual matrix R3: 
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The feet that the third residual matrix R3 vanishes indicates that the eigenvalue X4 = 0 and that the 
eigenvector £4 is completely arbitrary. The eigenvector ^4 may be conveniently choseai to be orthogonal to the 
eigenvectors fi/ = (1,1,1,1), £2^ = (1,0,0,-1X and £3*^ = (0,1,-1,0), so that the eigenvector e7 = (1,-1,-1,1). Indeed, 
one may readily verify that: 
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In this case, since the eigenvalues X.2 = 2 and A.3 = 2 are equal, and, hence, degenerate, the eigenvectors 
(1^0,0,-1) and Es^ = (0,1,-1,0) belonging to the degenerate eigenvalues X2 = 2 = A.3 may be conveniently 
chosen to be orthonormal. A Gram-Schmidt orthonormalization procedure may be used, for example. 

Similarly, SVD of B shows that B = PT^, where P is the Principal Component matrix and T is the Scores 

matrix: 
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SVD confirms that the singular values of B are 2, '\fe and V2, the positive square roots of the eigaivahies 
A.| ° 4, Xz <= 2 and X] 2 of the matrix product B^B. 
Likewise, SVD of B''' shows that. 
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SVD confirms that the (non-zero) singular values of B"^ are 2, >b, and >fa, the positive square roots of the 
eigenvalues Xi = 4, X2 = 2 and ^3=2 of the matrix product AA*^. Note that the columns of tiie Principal 
Component matrix P (the rows of the Principal Component matrix P^) are the orthonormalized eigenvectors of 
the matrix product BB^. Also note that the non-zero elements of the Scores matrix T are the positive square roots 

1 0 2, V2, and V2 of the (non-zero) eigenvalues A.i = 4, X2 = 2 and A,3 = 2 of both of the matrix products B^B and BB^. 

The matrices A and B discussed above have been used for the sake of simplifying the presentation of 
PCA and the power method, and are much smaller than the data matrices encountered in illustrative embodiments 
of the present invention. For example, in various illustrative embodiments, about m = 100-600 processing runs 
may be measured and/or monitored over n - 10-60 processing tool variables and/or processing parameters. Brute 

15 force modeling, regressing all m = 100-600 runs over n= 10-60 variables, may constitute an ill-conditioned 
regression problem. Techniques such as PCA and/or partial least squares (PLS, also knov/n as projection to latent 
structures) reduce the complexity in such cases by revealing tiie hierarchical ordering of the data based on levels 
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of decreasing variability. In PCA, this mvolves finding successive Principal Components. In PLS techniques such 
as NIPALS, this involves finding successive latMit vectors. 

As shown in Figure 16. a scatterplot 1600 of data pomts 1610 may be plotted in an n-dimensional 
variable space (n=3 in Figure 16). Ihe mean vector 1620 may lie at the center of a p-dimensional Principal 
Component ellipsoid 1630 (p=2 in Figure 16). The mean vector 1620 may be determined by taking the average of 
the columns of the overall data matrix X. The Principal Componem ellipsoid 1630 may have a first Principal 
Component 1640 (major axis in Figure 16), with a length equal to the largest eigenvalue of the mean-scaled data 
matrix X-M, and a second Principal Component 1650 (minor axis in Figure 16), with a length equal to the next 
largest eigenvalue of die mean-scaled data matrix X-M. 

For example, the 3x4 matrix given above may be taken as the overall data matrix X (again for the 
sake of simpUcity). comsponding to 4 runs over 3 variables. As shown in Figure 17, a scatterplot 1700 of data 
points 1710 may be plotted in a 3-dimensional variable space. The mean vector 1720 u may Ue at the center of a 
2.dimensional Principal Component ellipsoid 1730 (really a circle, a degenerate ellipsoid). The mean vector 1720 
IX may be determined by taking the average of the columns of the overall 3x4 data matrix B^. TTie Principal 
Component ellipsoid 1730 may have a first Principal Component 1740 C^ajor" axis in Figm« 17) and a second 
Principal Component 1750 ("minor" axis in Figure 17). Here, the eigenvalues of the mean-scaled data matrix 
B^-M are equal and degenerate, so die lengths of die "major'' and "minor" axes in Figure 17 are equal. As shown 
in Figure 17, the mean vector 1720 ^ is given by: . 



1 

^=4 



0 



+ 



0 

-1 



-1 



0 



, and the matrix M has the mean vector 1720 ii for all 4 



columns. 



Principal Components Analysis (PCA) may be illustrated geometrically. For example, the 3x2 matrix C 

fl -l\ 

may be taken as the overall data matrix X (again for 



(similar to the 3x2 matrix A given above): C = 



the sake of simplicity), corresponding to 2 runs over 3 variables. As shown in Figure 18, a scatterplot 1800 of 
data points 1810 and 1820, with coordinates (1,1,1) and (-1,0,1), respectively, may be plotted in a 3-dimensional 
variable space where the variables are respective rapid thermal processing tool and/or parameter values for each 
of the 3 variables. The mean vector 1830 ji may lie at the center of a l-dimensional Principal Component 
ellipsoid 1840 (really a line, a very degenerate ellipsoid). The mean vector 1830 m may be determined by taking 
the average of the columns of the overall 3x2 data matrix C. The Principal Component ellipsoid 1840 may have a 
first Principal Component 1850 (the "major" axis in Figure 18, with length '>/5, lying along a first Principal 
Component axis 1860) and no second or third Principal Component lying along second or third Principal 
Component axes 1870 and 1880, respectively. Here, two of the eigenvalues of the mean-scaled data matrix C-M 
are equal to zero, so the lengths of the "minoi^ axes in Figure 18 are both equal to zero. As shown in Figure 18, 
the mean vector 1830 ji is given by: 
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, and the matrix M has the mean vector 1830 \x for both columns. As 



shown in Figure 18, PCA is nothing more ttian a principal axis rotation of the original variable axes (here, ttie 
respective rapid tfiermal processing tool and/or parameter values for each of the 3 variables) about the endpoint 
of the mean vector 1 830 ^i, with coordinates (0,1/2,1) with respect to Ae original coordinate axes and coordinates 
[0,0,0] with respect to the new Principal Component axes 1860, 1870 and 1880. The Loadings are merely the 
dhrection cosines of the new Principal Component axes 1860, 1870 and 1880 with respect to the original variable 
axes. Hie Scores are simply the coordinates of the data points 1810 and 1820, [5^*^/2,0,0] and [-5^*^/2,0,0], 
respectively, referred to the new Principal Component axes 1860, 1870 and 1880. 

The mean-scaled 3x2 data matrbc C-M, its transpose, the 2x3 matrix (C-Mf, dieir 2x2 matrix product 
(C-M)'^(C-M), and their 3x3 matrix product (C-M) (C-M)^ are given by: 
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The 3x3 matrix (C-MXC-M) is the covariaiice matrix 83x3, having elements sy, where 1=1,2,3, and 

2 22 



j=l,2,3, defined so that: 5.. = 



's corresponding to the rectangular 3x2 matrix Qi^. 



finding solutions to the secular equation: 



= 0 . The eigenvectors of the matrix product 



" 2(2-1)^. , 

EIG reveals that the eigenvalues \ of th% matrix product (C-M)^(C-M) are 5/2 and 0, for example, by 

"■ -'5/4 ■ 
■5/4- 5/,4-A 

(C-M)^(C-M) are solutions t of the equation (CrM)^(C-M)t = Xt, which may be rewritten as 
((C-M)\C-M)-X)t = 0. For the eigenvalue . X.i = 572, the eigenvector ti may be seen by 

r5/4-A -5/4^ /'-5/4 -5/4^ . 

\t = \ ... _..|/ = 0.; tO ibe t, ={1,-1). For the eigenvalue A.i = 0, the 



^-5/4 5/4-A 
eigenvector tj may be seen by 
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•nie power method, for example, may be used to determine the eigenvalues X and eigenvectors e of the 
malrix product (0-MXC-M)\ where the eigenvataes X and fce eigenvectors 2 are solutions e of Ae equation 
((C-M)(C-M)^)B = Xe- a trial eigenvector ^ = (1,1.1) may be used: 

(2 1 OYl^l M ^ ( \ \ 



{(C-M){C-M)^)p = 
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niis ilhistrates that the trial eigenvector £^ = (1,1,1) gets replaced by the improved trial eigenvector 
= (1.1/2,0) that happened to correspond to the eigenvector e/ = (1,1/2.0) belonging to die eigenvalue X, = 5/2. 
TTie power method then proceeds by subtracting the outer product matrix fi.Ei'" ^om the matrix product 
(C-MKC-M)^ to form a residual matrix Ri: 
. f2 1 0^ 





< 1 ^ 






'2 


1 


0^ 




' 1 


1/2 


0^ 





1/2 


(1 


1/2 0)= 


1 


1/2 


0 




1/2 


1/4 


0 


J 


lo. 






^0 


0 


o> 




lo 


0 





1 1/2 0 
[0 0 

' 1 1/2 0\ 
R^= 1/2 1/4 0 
U 0 0^ 

Another trial eigenvectore^ = (-1A0), orthogonal to the eigenvector = (1,1/2,0) may be used: 



r 



((C-M){C-My-p^p^)p = R,l = 



V 

ITiis indicates that the trial eigenvector E^ = (-1*2.0) happened to correspond to the eigenvector 
£2^^ = (-1,2,0) belonging to the eigenvalue X2 = 0. The power mettiod then proceeds by subtracting the outer 
product matrix E2E2^ fron» *e residual matrix Ri to form a second residual matrix R2: 
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- 2 


(-1 


2 0) = 


1/2 


1/4 


0 - 


- -2 


4 


0 




lo 


0 


0. 


.0, 






I'o- 


0 


o> 


.0 


0 


0. 



f 0 5/2 0) 
5/2 -15/4 0 
^ 0 0 0^ 

Anoflier trial eigenvector e^ = (0,0,1), orttiogonal to fee eigenvectOTS eJ = (1.1/2,0) and ^ = (-1,2,0) 
may be used: 
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{{C-M){C~My-p^£-p^£^)p = R^p = 



I 



5/2 



5/2 -15/4 0 
0 0 0 



oYo' 

0 



^0^ 
0 



((C-M)(C-M/-£,£[-£^^^)£=iJ2£ = 0 



0^ 



= ^£,. 



This indicates that the trial eigenvector £^ = (0,0,1) happened to correspcmd to . the eigenvector 
E3^ = (0,0,1) belonging to the eigenvalue 1.3 = 0. Lideed, one may readily verify that: 



Similarly, SVD of C-M shows that C-M = PT^, where P is the Principal Component matrix (whose 
columns are ortfaonormaiized eigenvectors proportional to £1, 22 and B3> whose elements are the Loadings, 
the direction cosines of the new Principal Component axes 1860, 1870 and 1880 related to die original variable 
axes) and T is the Scores matrix (whose rows are the coordinates of &e data points 1810 and 1820, referred to &e 
new Principal Component axes I860, 1870 and 1 880): 





^2 1 0' 


'^1 




'0^ 






((C-M)(C-Mf)£3 = 


1 1/2 0 


0 




0 


= 0 


0 




,0 0 0, 













0 



C-M = 



i-Ti. 



0 
0 



/V2 -/V2I 



0 
0 



0 
0 



The transpose of the Scores matrix (T^) is given by the product of the matrix of eigenvalues of C-M with 
a matrix whose rows are orthononnalized eigenvectors proportional to t| and t2. As shown in Figure 18, the 
direction cosine (Loading) of the first Principal Component axis 1860 with respect to the variable 1 values axis is 

given by COS0j, = > direction cosine (Loading) of the first Principal Component axis 1860 with 

respect to the variable 2 values axis is given by COS ©21 = ■ Similarly, the direction cosine (Loading) of 
the first Principal Component axis 1860 with respect to the variable 3 values axis is given by 
cos ©3 J =cos(^j=0. Similarly, the direction cosine (Loading) of the second Principal Component 

axis 1870 with respect to the variable 1 values axis is given by COS ©,2 —''^^y the direction cosine 
(Loading) of die second Principal Component axis 1870 with respect to the variable 2 values axis is given by 
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10 



15 



20 



25 



30 



cos0a = ^ . and tbe direction cosine (Loading) of the second Principal Component axis 1870 with respect 

to the variable 3 values axis is given by COS0„ = C0S^)= 0 . Lastly, the direction cosine (Loading) of the 
third Principal Component axis 1880 vrtth respect to the variable 1 values axis is given by 
cose,3 =cos(^)=0, the direction cosine (Loading) of the third Principal Component axis 1880 wi& 
respect to fee variable 2 vataes axis is given by COS©,, = COs{^) = 0 , and the direction cosine (Loading) of 
the third Principal Component axis 1880 with respect to the variables values axis is given by 
cos©33 =cos(o)=l. 

SVD confinns that fee singular vahies of C-M are ^5^2 and 0. the nonwiegative square roots of the 
eigenvalues X, = 5/2 and 0 of the matrix product (C-M)^(C-M). Note that the columns of the Principal 
Component matrix P are the wthonoimalizBd efeenvectors of the mabix product (O-MKC-M)'. 

Taking another example, a 3x4 matrix D (identical to the 3x4 matrix given 



above): D = 



1 1 
1 0 
0 1 



1 n 

0 -1 
-1 0 



may be takai as die overall data matrix X (again Ae sate of simplicity). 



corresponding to 4 nms over.3 variables. As shown in Figure 19. a scatterplot 1900 of data points with 
coordmates (1.1.0), (1,0,1), (1,0,-1) and (1.-1,0), respectively, may be plotted in a 3-dimensional variable space 
where flic variables are respective rapid thermal processing tool and/or parameter values for each of the 3 
variables. The mean vector 1920 u may Ue at the center of a 2-dimensionaI Principal Component ellipsoid 1930 
(really a circle, a somewhat degenerate ellipsoid). The mean vector 1920 a may be determined by taking the 
average of the columns of the overall 3x4 data matrix D. The Principal Component ellipsoid 1930 may have a 
first Principal Component 1940 (the "major"' axis in Figure 19, with length 2, lying along a fiist Principal 
Component axis 1950), a second Principal Component 1960 (the "minor" axis in Figure 19, also wifli length 2, 
lying along a second Principal Component axis 1970), and no third Principal Component lying along a fliird 
Principal Component axis 1980. Here, two of the eigenvalues of the mean-scaled data matrix D-M are equal, so 
die lengAs of the "major" and "mmoi" axes of Ihe Principal Cranponent ellipsoid 1930 in Figure 19 are both 
equal, and the remainmg eigenvalue Is equal to zero, so tfie length of the other "minor" axis of the Principal 
Component ellipsoid 1930 in Figure 19 is equal to zero. As shown in Figure 19, the mean vector 1920 tt is given 
by: 



1 





'^1 




^1^ 




r n 














1 


+ 


0 


+ 


0 


+ 


-1 






0 












I- 1 






_ 







and the matrix M has the mean vector 1920 m for all 4 



columns. As shown in Figure 19, PCA is nothing more than a principal axis rotation of the original variable axes 
(here, tiie respective rapid thermal processing tool and/or parameter values for each of the 3 variables) about the 
endpoint of the mean vector 1920 with coordinates (1,0,0) with respect to the original coordinate axes and 
coordmates [0.0,0] with respect to the new Principal Component axes 1950, 1970 and 1980. The Loadings are 
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20 



25 



merely the direction cosines of the new Principal .Component axes 1950, 1970 and 1980 wi^ respect to the 
original variable axes. The Scores are simply the coordinates of the data points, [1,0,0], [0,1,0], [0,-1,0] and 
[-1,0,0], respectively, referred to the new Prmcipal Component axes 1950, 1970 and 1980. 
The 3x3 matrix product (D-MXD-M)^ is given by: 

Jo I o^i 



iD-M)iD-My = 



0 0 0 0 
10 0-1 
\,0 1 -1 0) 



0; 0 1 
0, 0 -1 
\0 -1 0 



'0 0 0^ 
0 2 0 
^0 0 2j 



The 3x3 matrix 0O-M)(D-M)'^ is 3 times the covariance matrix 83x3, having elemoits sg, where i=l,2^. 



and j=l A3y defined so that: Sfj =. • 



4(4-1) 



- , corresponding to the rectangular 3x4 matrix 



EIG reveals that the eigenvalues of the matrix product (D-M)(D-M)^ are 0, 2 and 2. The eigenvectors of 
10 the matrix product (D-M)(D-M)^ are solutions £ of the equation ((D-M)(D-M)^)2 = and may be seen by 
inspection to be ^i^ = (0,1,0), ^ = (0,0,1), and £3! = (1,0,0), belonging to the eigenvalues Xi =2, X2 = 2, and 
X3 = 0, respectively (following the convention of placing the largest eigenvalue first). 

As may be seen in Figure 19, the direction cosine (Loading) of the first Prmcipal Component axis 1950 

with respect to the variable 1 values axis is given by COS©,i = cos(^)= 0, the direction cosine (Loading) of 
15 the first Principal Component axis 1970 with respect to the variable 2 values axis is given by 
COS =cos(0) = l, and tiie direction cosine (Loading) of the first Principal Component axis 1060 with 

respect to the variable 3 values axis is given by COS ©3, ^cos^'^j^O. Similarly, the direction cosine 
(Loading) of the second Prmcipal Component axis' 1970 with respect to the variable 1 values axis is given by 
COS0J2 =COs(^)= 0, the direction cosine (Loading) of the second Principal Component axis 1970 with 

respect to the variable 2 values axis is given by COS ©22 ~ ^^^(/^) ~ ^ * du^ction cosme (Loading) of 

the second Principal Component axis 1970 with respect to the variable 3 values axis is given by 
COS = cos(0) = 1 . Lastly, the direction cosine (Lo^diing) of the third Principal Component axis 1980 with 

respect to the variable 1 values axis is given by cos ©^3 = cos(0) = 1 , the direction cosine (Loading) of the 
third Principal Component axis 1980 with respect to the variable 2 values axis is given by 

COS023 = COs(^)= 0 , and the direction cosme (Loading) of the third Principal Component axis 1980 with 



respect to the variable 3 values axis is given by QOS033 = COs|^J= 0 



■4- 
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Tht transpose of the Scores matrix may be obtained simply by multiplying tiie mean-scaled data 
matrix D-M on the left by the transpose of the Principal Component matrix P, whose columns are ei. Ei. Ej. «» 
(xtfamormalized eigenvectws of the matrix product CD^MXD-M)^: 



ro 1 oYo 0 0 0^ 

0 0 1 



'10 0-1 
0 1-10 
^0 0 0 0) 



10 0-1 
{l 0 0^0 1 -1 oj 
The cotamns of the transpose of the Scores matrix T"^ (or, equivalently. the rows of the Scores matrix T) 
are, indeed, the coordinates of the data points, [1,0,0], [0.1,0], [0,-1,0] and [-1.0,0]. respectively, referred to the 
new Principal Component axes 1950, 1970 and 1980. 

' The matrices C and D discussed above have been used for the sake of simplifying the presentation of 
PCA and the power method, and are much smaller than the date matrices encountered in ilhistrative embodiments 
of the present invention. For example, in various aiustrative embodiments, about m = 100^00 processing runs 
may be measured and/or monitored over n = 10-60 processing tool variables and/or processing parameters. Brute 
force modeling, regressing all m = 100-600 runs over n= 10-60 variables, may constitute an ill-conditioned 
regression problem. Techniques such as PCA and/or partial least squares (PLS, also known as projection to latent 
structures) reduce the complexity in such cases by revealing the hierarchical ordering of the data based on levels 
of decreasing variability. In PCA, this involves finding successive Principal Components. In PLS techniques such 
as NIPALS, this involves finding successive latent vectors. In various aiustrative embodunents, the tool and/or 
sensor drift during about m = 100-600 processing runs measured and/or monitored over n = 10-60 processing tool 
variables and/or processing parameters may be mapped to an equivalent problem of the dynamic ftow of about 
m = 100-600 points (representing the m = 100-600 processing rans) through an n-dimensional space 
(representing the n = 10-60 variables). PCA may be used, far example, to correct the rapid thermal processing by 
indicating an appropriate multi-dimensional "rotation" to be made on the processing tool variables and/or 
processing parameters to compensate for the tool and/or sensor drift from the respective setpoint values. 

In various alternative illustrative embodiments, adaptive sampling processing models may be built in 
alternative ways. Such adaptive sampling processing models may also be fomied by monitoring one or more tool 
variables and/or one or more processing parameters during one or more processing runs. Examples of sudi tool 
variables and/or processing parameters may comprise one or moK pyrometer trace readings. <Hie or more lamp 
power trace readings, one or more tube temperature trace reacBngs, one or more current readings, one or mote 
infrared (IR) signal readings, one or more optical emission ^ectrum readings, one or mcwe process gas 
temperature readings, one or more process gas jsessure readings, one or more process gas flow rate readings, one 
or more etch depths, one or more process layer tiiicknesses, one or more resistivity readings, and the like. In these 
various alternative illustrative embodiments, building the adaptive sampling processing models may comprise 
fitting the collected processing data using at least one of polynomial curve fitting, least-squares fitting, 
polynomial least-squares fitting, non-polynomial least-squares fitting, weighted least-squares fitting, weighted 
polynomial least-squares fitting, and weighted non-polynomial least-squares fitting, either in additicm to, or as an 
alternative to, using Partial Least Squares (PLS) and/or Principal Components Analysis (PCA), as described 
above. 
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In various illustrative embodiments, samples may be collected for iV+1 data points (x/^i), where / = 1, 2, 
N, JV+l, and a polynomial of degree iV, 

N 

P^(x) = ao +a,x + a2X^ +-- + ajtJC* + =^a^x* ,may befittotheN+1 data points {Xij;^). 

For example, 100 time data points (N^ 99) may be taken relating the pyrometer trace reading the lamp power 
5 trace reading/ and/or Ae tube temperature trace reading 7, during a processing step, to tfie eflfective yield t of 
workpieces emerging fiom the processing step, resulting in respective sets ofN^l data points {phtd> ifh^t\ and/or 
{TiJd. The values may be the actually measured values of the processing tool variables and/or processing 
parameters, or ratios of actually measured values (normalized to respective reference setpoints), or logarithms of 
such ratios, for example. Polynomial interpolation is described, for example, in Numerical Methods for Scientists 
10 and Engineers, by R.W. Hammmg, Dover Publications, New Yoik, 1986, at pages 230-235. The requirement that 

N 

the polynomial P;v(x) pass through the data points (x/j^/) is =='PffiXf)=^^aj.xf ^ for 1=^1,2, 

N-^l, a set ofNH conditions. These JV+1 conditions then completely determine the iV^l coefficients a*, for k - 0, 
1,.,.,M 



The determinant of the coefficients of the unknown coefficients a^ is the Vandermonde 
1 



1 



15 determinant: F^+j = 



^1 


4 ■ 








■• x" ■ 






•• x" 






■• x" 



= |xf |, where i- 1, 2, TV^l, and A: = 0, 1, N. The 



Vandennonde determinant Vf^i, considered as a function of the variables Xj, fVi = ^Affi(xi^2j ••>^A^»^Affi), is 
clearly a polynomial in the variables Xi, as may be seen by expanding out the determinant, and a count of die 

A //(AT + l) 

exponents shows that the degree of the polynomial is 0.+ 1 + 2 + 3 + — \-k-i \r N = 2^k =^ ~ 

(for example, the diagonal term of the Vandermonde determinant K^sm is 1 • ■ X3 * * ^jv+i )* 



20 Now, if jcat+i ^Xj, for y = 1, 2, iV, then the Vandermonde determinant Vj^x = 0, since any determinant 

with two identical rows vanishes, so the Vandermonde determinant Vi^i must have the factors (xi^i-xj), for 7 = 1, 

2, N, corresponding to the N factors J^(^a^+i ~Xj), Similarly, if x^^xjy for y = 1, 2, NA, then the 

Vandennonde determinant = 0> so the Vandermonde determinant fVi must also have the factors (xapX;), for 

7=1,2, corresponding to the A^-1 factors -;:^.). Generally, ifXfl,=xy, fory </w, where i?? = 2, 

25 . . A^, iV+1 , then the Vandermonde determinant Vn^y = 0, so the Vandennonde determinant Kam must have all the 
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10 



fectors ix,-xj\ for J <«, where i« = 2. .... N, J\f+1. corresponding to the factors Y[[x„ - X J. Altogether, this 

represents a polynomial of degiee J\^ + (i\^-l)+ "+A + - + 2+l = X^ = 

m = W+1, for example.; may take on any of AT values.; = 1. 2 N, and when m = iV.y may take on any of W-1 

values.j= 1, 2. .... AW. and so forth (for example, when m = 3,i may take only two values,; = 1, 2, and when 
m = 2.'y may take only one value,; = IX which means that all the fectors have been accounted for and all that 
remaiiis is to find any multiplicative constant by wUch these two representations for the Vandennonde 
determinant V^i might differ. As noted above, the diagonal term of the Vandermonde determinant F^, is 
.2 . jgW^^ ^ be cwnpared to the term torn the left-hand sides of the product of 

AT W-l . . 2 . .1 



fectors 



nk -j-n(-... -.n('» -.i-nfc 

»7>jti >i ^■••^ 



iV-1 



.X3 -JCj, which is identical, so the multiplicative constant is unity and the Vandennonde 



detenninant V^^x is PV+i(^i»^2»'*''%+i)~KH Ilv^w ""^y/- 



This factorization of the Vandennonde determinant K^^i shows that if x.^xj, for ii^j, then the 
Vandennonde detenninant Vm.i cannot be zero, which means that it is always possible to solve for the unknown 
coefficients a,, since the Vandennonde detenninant JV, is the determinant of the coefficients of the unknown 
15 coefficients a^. Solving for the unknown coefficients usmg detenninants, for example, substitutmg the results 

into the polynomial of degree iV. P^ix)^^^^^^^ ^'^^^^^^'^^''^^^^ 



= 0, which is the solution to the polynomial fit This may be seen 



directly as follows. Expanding tiiis detenninant by the elements of the top row, this is clearly a polynomial of 
degree N, The coefficient of the element :v in the first row m the expansion of this detenninant by the elements of 
20 the top row is none other than the Vandennonde determinant Vj^i, In other words, the cofactor of the element y 
m the first row is, in fact, the Vandennonde detenninant V^^i. Indeed, the cofactor of the nth element in the first 
row, where is the product of the coefficient a„,2 in the polynomial expansion 

N 

y-Pff (x) = Yj^k^^ Vandennonde determinant Vf^i. Furthermore, if:* and take on any of the 

sample values and y-,, for / = 1. 2, N, //+1, then two rows of the detenninant would be the same and the 
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determinant must then vanish. Thus, the requirement that the polynomial y-Pti (x) pass through the data 



IT 

points (Xfj^O. J'/ =-PAr(^,) = 2^*^* ,fori = l,2,...,iV,Ar+l, is satisfied. 



For example, a quadratic curve may be found that goes through the sample data set (-l,a), (0,^), and 
{l,c). The three equations are P2 (-1) = a =atra&a2, P2 (0) = b =^7o, and P2 (1) = c =flo+o,+fl2, which imply that 

c—a c + a—lb 2 

6=^, c-a = 2au and c+a-2i? = 2a2, so Aat y(x) = ^^(x) = 6 + X + X , which is also tiie 



result of expanding 



y 1 X x^ 

a 1 -1 1 

6 10 0 

c 1 1 1 



1 


-1 


1 




a 


-1 


1 




a 


1 


1 




a 


1 


-1 


1. 


0 


0 


-1 


b 


0 


0 


+ x 


b 


1 


0 




b 


1 


0 


1 


1 . 


1 




c 


1 


1 




c 


1 


I 




c 


1 


1 



the coefficient ofy being the respective Vandennonde determinant = 2. 



Similarly, a quartic curve may be found thiat goes through the sample data set (-2,fl), (-1,A), (0,c), (1,6), 
and (2,a). The five equations are P^ (-2) = a =ao^2ai+4a2-8a3+16a4, P^ (-1) = b ^ao-ai-^ai-a^+aA, P^ (0) = c =00, 
10 (1) = 6 =00+01+02+03+^4, and P4 (2) = a =00+201^-402+803+1604, which imply that c =0©, 0 = Oi = 03 (which 
also follows from the synMnetry of the data set), (aHc)-16(6-ir) = -12a2, and (a-c>4(^) = 12aj, so that 

j;(x) = (x) = c x^ +- -x^ 



12 



12 



In various alternative illustrative embodiments, samples may be collected for M data points (xij/,-), where 
I = 1, 2, M, and a first degree polynomial (a'straight line), P, (x) = ^0 + " 2^*^* > "^^V ^® ^ 



it=o 



15 least-squares sense) to the M data points (xiyi)\ For example, 100 time data points (A/= 100) may be taken 
relating the pyrometer trace reading p, the lamp power trace reading/ and/or the tube temperature trace reading 
r, during a processing step, to the effective yield^ of workpieces emerging from the processing step, resulting in 
the M data points (p,,^/). Of,//), and/or (75,//). The values may be the actually measured values of the processing 
tool variables and/or processing parameters, oi; ratios of actually measured values (normalized to respective 

20 reference setpoints), or logarithms of such ratios, for example. Least-squares fitting is described, for example, in 
Numerical Methods for Scientists and Engineers, by R.W. Hamming, Dover Publications, New York, 1986, at 
pages 427-443. , I 



The least-squares criterion may be used in situations where there is much more data available than 
parameters so that exact matching (to within round-off) is out of the question. Polynomials are most commonly 
25 used in least-squares matchmg, although any linear family of suitable functions may woik as well. Suppose some 
quantity x is being measured by making M measurements jc/, for 1, 2, A/, and suppose lhat the 
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measurements are related to the "true" quantity x by the relation X, = X + f , , for i= 1, 2, K where the 
residuals e> are regarded as noise. The principle of least-squares states that the best estimati? ^ of the true value x 
is the number that minimizes the sum of the squares of flie deviations of the data fiom their estimate 

■ /(^) = ^ff =^(x, , which is equivalent to the assumption that fee average x„ where 

f=l i"I 

5 jr =l.yx is tiie best estimated of the true value ;c.'niis equivalence may be shown as foUows. First, the 
" " 

principleofleast.squaresleadstotheaverage:ci.Regaiding/(^ = £^^^ as a fimction of the 

..MM 

best estimate 4, minimization with respect to the best estimate ^ may proceed by differentiation: 
m = _2y(x.-^) = 0. which implies that t^.-t^ ^O^L^-M^ , so that 
^ = _Lyx, =x„ .or, in otherwords. that the choice x,= #minimi2es the sum of the squares ofthe residuals 

10 £5. Noting also Aat ^^^^ = 2^1 = 2M > 0 , the criterion for a minhnum is established. 

Conversely, if the average x„ is picked as the best choice x„ = ^ . it can be shown that this choice, indeed, 
mmimizes the sum of the squares of tiie residuals s,. Set 

/(xJ^Sk. -xj =txf -2.±x^ =txf ^^l -t-^ -^l- 

any otiier value x» is picked, then, plugging that other value x» into /x) gives 
15 /(X*) = I(x, -x,)^ =^xf -2x,^x, =^x] -2x,Mx„ +Mxl Subtracting Xx.) fiom 

XX.) gives m)-fix„)^M[xl -2x„x, +x,^i= M(x„ -xJ^ ^0. so thatAx*)^/^ with equality 
if. and only % x» = jc„. In oflier words, the average x„ indeed, minimizes the sum of the squares of the residuals e>. 
Thus, it has been shown tiiat the principle of least-squares and flu! choice of die average as the best estimate are 
equivalent. 

20 There may be oflier choices besides die least-squares choice. Again, suppose some quantity x is being 

measured by making ^measurements x„ for / = 1, 2, .... M, and suppose that the measurements x^ are related to 
the "true" quantity x by the relation X,=X^■S.,, for i = 1, 2, .... M, where the residuals s, are regarded as noise. 
An alternative to the least-squares choice may be that -another estimate x of the true value x is the number that 
ininimizes the sum of the absolute values of the deviations of the data from their estimate 
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M M 

f (Z) = Xl^' l ^ *~ z\ > which is equivalent to the assumption that ihe median or middle value x„ of the 

M 1=1 

M measurements x„ for r = 1, 2, A/(if M is even, then average fee two middle values), is the other estimate x 
of the true value x. Suppose that there are an odd number M = 2H 1 of measurements x„ for i = 1 , 2, . . , , M , and 
choose the median or middle value jc„ as the estimate x the true value x that minimizes the sum of the absolute 

5 values of the residuals Any upward shift in this value x„ would increase the k terms - xj that have Xf 
below and would decrease the k temis |x, - x| ttiat have x, above each by the same amount However, tiie 
upward shift in this value x„ would also increase the tern |x„, - x| and, thus, inwease fee sum of the absolute 
values of all the residuals Yet another choice, instead of minimizing the sum of the squares of the residuals £), 

X - + X 

would be to choose to minimize the maximum deviation, which leads to ^ = ^miAmgft » midrange 
1 0 estimate of the best value. 

Returning to the various alternative iUustrative embodiments in which samples may be collected for M 
data points (x/j//), where i=l, 2, Af, and a first degree polynomial (a straight line), 
I 

(x) = ^0 + ^i^ = X^*^* ' be fit (in a least-squares sense) to tiie M data points (x/j;,), there are two 
parameters, and Oi, and a fimction F{a^ai) that needs to be minimized as follows. The toction F{ao,ax) is 

A/ M - A/ 



I S given by F{a^ , ) = ^ f / = ^ [P^ (x,. )-yif =^ + a^X^ - y,- ]^ and setting the partial derivatives 

i=l ml Ml 

9F(ao,a,) ^^^r i ^ 

of F(aQ,ai) with respect to ao and Oi equal to zero gives ^-^/A^q '^^i^i ""J'lJ^^ 

dF(ar,,a,) i ^ . . 
— — = 2 > + a,x,. " jx,. = 0 , respectively. Simplifying and rearrangmg gives 

MM M M M 

M + fl, ^ X- =^ y and ]^ x . + x] =^ , respectively, where there are two equations for 

/«l i«l 1=1 /=! /=1 

the two unknown parameters and ai, readily yielding a solution. 

20 As shown in Figure 20, for example, a first degree polynomial (a straight line), 

P^ (x) = flo + a,x = ^fljtX* , may be fit (in a least-squares sense) to the Af = 5 data points (1,0), (2,2), (3,2), 

(4,5), and (5,4). The residuals £■„ for / = 1, 2, 5, are schematically illustrated in Figure 20. The equations for 
the two parameters ao and ax are 5ao+15ai = 13 and \5aQ^55ai = 50, respectively, so that, upon multiplying the 
first equation by 3 and then subtracting that fi-om the second equation, thereby eliminating ao, the solution for the 
25 parameter becomes - 1 1/10, and this implies that the solution for the parameter ao becomes ao = -7/10. The 
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first degree polynomial (the straight line) that provides the best fit, in the least-squares sense, is 
(x) = — + |i JC = ^ (- 7 + 1 Ijc) , as shown in Figure 20 . 

As shown in Figure 21, for example, a first degree polynomial (a straight line), 
Pi W = «o + = S^*^* ' be fit (in a least-squares sense) to the M= 7 data points (-3,4), (-2,4), 



5 (.1,2). (0,2), (1.1), (2,0). and (3.0). The residuals for /= 1. 2. .... 7, are schematically iUusbated in Figure 21. 
Ihe equations for the two parameters a, and fl, are 

aoM + a,|:*,=i:;',=7«o+«.(-3-2-l + 0 + l + 2 + 3) = (4 + 4 + 24-2+l+0 + 0) and 

i«l #=1 

«o&+«.i-NZ*.3',=a.(9+4+l+0-Hl+4+9)=(-12-8-2+0+U0+0), 

1=1 W M 

respectively, which give la^ = 13 and 28a, = -21, respectively. In other words, a, = 13/7 and a, = -3/4, so that, 
10 the first degree polynomial (the straight line) that provides the best fit, hi the least-^iquares sense, is 

13 3 

p. (x) = x,as shown in Figure 21. 

'74 

In various other alternative illustrative embodunents, samples may be coUected for Mdata points (x,^,), 
where /=1. 2 M, and a polynomial of degree N, 

P,ix) = a, +a,x + a,x' +- + a,x' +- + a^x'' ^f^a.x' . may be fit (in a least-squares sense) to 



15 the M data points (x>y,). For example, 100 time data points (M= 100) may be taken relating the pyrometer trace 

reading the lamp power trace reading/ and/or the tube temperature trace reading T, during a processing step. 

to the effective yield t of workpieces emerging from the processing step, resulting in the M data pomts M 
and/or (m. The values may be the actually measured values of the piocessmg tool variables and/or 

processing parameters, or ratios of actually measured values (normalized to respective reference setpoints), or 
20 logarithms of such ratios, for example. In one illustrative embodiment, the degree N of the polynomial is at least 

10 times smaller than M. 



The function /Xflo/ii,...,ow) may be minunized as foUows. The function F{ao^i....,aN) is given by 

MM 

F(aa,at,";aH) = ^6f =Xi[^w J setting the partial derivatives of F{ao,ai,...^K) with 
respect to aj. for j = 0, 1. .... N. equal to zero gives 



25 



^^^^"'^"•••'^^^ =2£[f,(x,)->>,.]x/ =2X I,a,xf-y, 

da J Ml i=iLA«o - 



x/ = 0 , for J - 0, 1, N, since {xDf is 
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the coefficient of aj in the polynomial = X^ik^* • Simplifying and rearranging gives 



I 



N 1 N r M ~{ ^ M 



for 7 = 0, I, N, where 



^xf*^ = S^+y and ^x/j/,. = Tj , respectively. There are equations 2j^i'^fc+7 = » forj^O, 1, 

iV, also known as the normal equations, for the iVH-l unknown parameters a*, for A = 0, 1, . . readily yielding a 
5 solution, provided that the determinant of the noraial equations is not zero. This may be demonstrated by 

showing that the homogeneous equations ^ ^k^k-t-j = 0 only have tiie trivial solution fl* = 0, for A: « 0, 1, . . 
Ny which may be shown as follows. Multiply the /th homogeneous equation by aj and sum over ally, fromy = 0 to 



N N 



M 



/=^.Z'^iE«A.>=E«,Z«*E^^/=I Z«W =E(^^(^.))'=o. which 

would rniply that P/tf(xi)sO, and, hence, that at = 0, for A: = 0, 1, the trivial solution. Therefore, the 

10 determinant of the normal equations is not zero, and the normal equations may be solved for the parameters 

ajt,forib'»0,l,...,MthecoefQcientsoftheleast-squarespolynomialofdegreeA; -PivW=^^it^* ,thatmay 



be fit to the Mdata points (x/j^/). 



Finding the least-squares polynomial of degree iV, (x) == J^tJfjfcX* , that may be fit to the M data 

points {Xi^li may not be easy when the degree N of the least-squares polynomial is very large. The normal 

15 equations ^^^^^it+y - '^j » fory = 0, 1, for the^V+l unknown parameters ^ar*, for ^ = 0, 1, may not 

be easy to solve, for example, when the degree of the least-squares polynomial is much greater than about 10. 
This may be demonstrated as follows. Suppose tot the A/data points (x/j/i) are more or less uniformly distributed 

in the interval 0 < X < 1 , so that S^^^j = ^xf*^ ^ I ^^^^^ ' . The resulting determinant for 

the normal equations is then approximately given by \ » 



M 




■ 1 




k + j+\ 





20 for 7,A= 0, 1, /V, where //m for j,k = 0, 1, ..,', JV-T, is the Hilbert determinant of order which has the value 
[0!l!2B!-(iV-l)f ^ ^ 



" M(iV + l)(iV + 2)"(2JV-l)' J. 



that-.] approaches zero veiy rapidly. For example, 
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the system of normal equations is ill-conditioned and, hence, difficult to soWe when the degree N of the 
least-squares polynomial is veary laise. Sets of orttiogonal polynomab tend to be brtter bdiaved. 

5 As shown in Figure 22. for example, a second degree polynomial (a quadratic). 

Piix) = flo +a,x + a2*^ = . may be fit (in a least-squares sense) to the M=7 data points (-3.4). 

»=o 

(-2.2). (-1.3). (0.0). (l.-l). (2.-2). and (3,-5). The residuals b,, for / = 1, 2. .... 7, are schematically iUustrated in 

2 

Figure22. Tlie three nonnal equations are =2;., for j = 0, 1, 2, where ^^x*"^ =5*^, and 

2 

Y^xiVi ^Tj, respectively, for tfie three parameters ob, and ahd aj. This gives 2^*^* "^-^o* 
#-i 

.0 taA..=7;. and taA.2=7'2. 

t-o *=" 

5^ =^^0 ^7.5^ ^^^^ =(-3-2-l-H-»-2+3)=0;S, =i;x? =(9-H4-H+l-h4+9)=28; 
S, =(-27-8-l + 0 + l + 8+27) = 0;S4 ^j;;^; =(81 + 16 + 1 + 0 + l + 16 + 8l) = 196; 

7i = tj^. =(4 + 2 + 3 + 0-l-2-5)=l;r, =tx;J^, =(-12-4-3 + 0-l-4-15)=-39; and 
7; ^ ^ =(36 + 8 + 3 + 0 - 1 8 - 45) = -7 , so that the nonnal equations become 
15 Va.S, =7-0 =l = 7ao +Ofl, +28^^ =7a„ +2803, =-39 = 0a, +28a, +0a2. 



and fl,5^^2 = -7 = 28^0 +Oaj +196^2=2800 +196^2, respectively, which imply (upon 

multiplying the first nonnal equation by? and then subtracting that from the third nonnal equation) that. 
-14 = -2I00, that 28aj = -39 (from the second nonnal equation), and (upon multiplying the first nonnal equation 
by 4 and tiien subtracting tiiat from tiie third noniial equation) ttiat -11 = 84a2, giving 3ao = 2, 28a, = -39, and 
20 84a2 = -ll, respectively. In other words, ao = 2/3, a,=-39/28, and a2 = -ll/84, so that, tiie second degree 
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polynomial (the quadratic) that provides the best fit, in the least-squares sense, is 

AW = --— =— (56-117x-llx^VasshowninFigurc22. 

^ 3 28 84 84^ ^ 



As shown in Figure 23, for example, a second degree polynomial (a quadratic), 

2 

W = Oq +a,x + ajX^ = 5]fl*x* , may be fit (in a least-squares sense) to the Jl/=6 data points (0,4), 
5 (1,7), (2,10), (3,13), (4,16), and (5,19). The residuals Sf, for i= I, 2, 6, are schematically illustrated in 

2 6 

Figure 23. The three normal equations are Ta^S^^^ -Tj, fory = 0, 1, 2, where ^xf*-' ^Sj^^j and 

Jb=0 

6 2 

^x/j;, =r^., respectively, for the ftree parameters ao» and Oi and This gives 2j^a'^a ~^o» 

2 2 

Z^A*i=rp and 2,^*5^**2=^2. "^^^^ 

**o *»o 

»S'o=ii^f =6;iS', =2^^i=(0 + l + 2 + 3 + 4 + 5) = 15;S2=X 
10 ^3 = J]xf =(0 + l+8 + 27 + 64 + 125) = 225;54 =2)^' =(0 + l + 16 + 81 + 256 + 625) = 979; 

6 6 



7i = j;^. =(4 + 7 + 10 + 13 + 16+19):= 69;r, =£x,j;, =(0 + 7 + 20 + 39 + 64 + 95)=225; and 

6 

7^2 = X^'^-''' =(0 + 7 + 40 + 1 1 7 + 256 + 475) = 895 , so that the normal equations become 

2 2 

Z^j^S^ =ro=69 = 6ao+15aj+55a2, X^A+i =7, = 225 = 15ao +55a, +225a2, and 

2 

2] 5^^4.2 = ^2 895 = SSOq + 225^, +979^2' respectively, which imply (upon multiplying the second 

*=o 

15 normal equation by 4 and then subtracting that from the first normal equation multiplied by 10) that 
-210 = -10ar350a2, and (upon multiplying the second normal equation by 11 and then subtracting that from the 
third normal equation multiplied by 3) that 210 = 70ai+66a2, However, adding these last two results together 
shows that 0 = aj- Furthermore, 3 = Oi, Therefore, using the fact that 3 = ai and 0 = a2» tbe normal equations 

2 2 

become 2] ^^.5^ = To = 69 = 6^0 +45, J] aj^^jt^, =r, = 225 = 15^0 +165, and 



2^ ^^k^M =^2 =895 = 55^0 +675, respectively, which all imply that 4 = ao. In other words, a© = 4, 
flj = 3, and ai = 0, so that, the second degree polynomial (the quadratic) that provides the best fit, in the 
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least-squares sense, is P,{x) = 4 + 3x+0x' =4 + 3x. which is reaUy just a straight line, as shown in 
Figure 23. Tht residuals e„ for / = I. 2, .... 6. aU vanish identically in this case, as schematically illustrated m 
Figure 23. 

In various other alternative iUustrative embodiments, samples may be collected for Mdata points (x^,). 
5 where i=l. 2, M, and a linearly indep«adent set of N+l fimctions fp). for J=0. 1. 2, .... N, 

y(x) = «o/oW+«./.W+" + V/*)+" + ^''-^''^*> = ?^^-^^^''^' be fit (to a 

non-polynomial least-squares sense) to the M data pomts {x^b- For example. 100 time data points (M= 100) 
may be taken relating flie pyrometer trace readingp. the lamp power trace reading/, and/or the tube temperature 
trace reading T, during a processmg step, to the effective yield / of workpieces emergmg from the processing 
10 step. resuWng hi Ac A/data pomts M m, and/or (T,,/,). The values may be the actually measured values of 
fte processing tool variables and/or prbcessmg parameters, or ratios of actually measured values (nonnalized to 
respective reference setpomts), or logarithms of such ratios, for example. In one illustrative embodiment, the 
number J^+1 of the Imearly mdependent set of basis fimctions j5(x) is at least 10 times smaller than M. 

The fimction F(ao,a,,...,aw) may be minimized as follows. Thft fimction F(ao.Oi,....aw) is given by 
15 F(Oo,a,,-,%) = 2]^'^ =Sb(*/)->''r an«J netting the partial derivatives of /XiWb -.aw) with 
respect to a„ for y = 0, 1. Af, equal to zero gives 

dF{a„a,, -,a^) ^ 2f[j;(x,)- J^.-lx/ =2jij^aJ,{x.;)-y}fjix,) = 0, fbry=0. 1. ....N, smce 

daj ,-i 1=1 L*-o -I 

fix) is the coefficient of aj in the representation X^,) = X^^/*^^')' Simplifying gives 



.Jt«0 



M 1 ^ 



20 K where f,Mx,)fA^,)^S,j and £/,(x,)j^; ^ , respectively. TTiere are N^l equations 

^^^^^ ^ ^ fory == 0, 1, N, also known as the normal equations, for the N-^l unknovwi parameters 
for Jk= 0, 1, readily yielding a solution, provided that the determinant of the normal equations is not zero. 
This may be demonstrated by showing that the homogeneous equations Yj^^^kj = ^ ^^^y *® ^^^^^ 

solution = 0, for it = 0, 1, .... N, which may be shown as follows. Multiply theyth homogeneous equation by aj 
25 and sum over all 7, 
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N N 



y«o k^o 1=1 /ai V*=o /vy^ / 

M f N ^ \ M 

Z =Z0'(^.))' =0- 

i=i v*«o /vy=o y iai 



but 



= 0, which would imply thaty(x,)sO, and, hence, that 



0, for it = 0, 1, JV, the trivial solution. TTierefore, the determinant of the normal equations is not zero, and 
the noimal equations may be solved for tiie parameters a*, for * = 0, 1, K the coefficients of the 

N 

5 non-polynomial least-squares representation y{x) = ^cijfj(x) that may be fit to tiie M data points (Xfj/j), 
using the linearly independent set of m-l functicms f/x) as the basis for the non-polynomial least-squares 

N 

representation y(x) = ^ Ojfj (x) . o 
y=o 



If the data points Qci^i) are not equally reliable for all M, it may be desirable to weigtit the data by using 
non-negative weighting factors Wf, The function F{ao^u..,,fiifd may be minimized as follows. The fimction 

M ^ \ 12 

10 F(flo,au...,flN) is given by ^(^0,^1,- -^ajy) = ^^w^^/ = ^W;[3;(x,-)->^,] and setting the partial 

/-I M 

derivatives of /^ao,ai,...,aAf) with respect to Oj, for y = 0, 1,' equal to zero gives 

for,-=o. 1 

^y M /»1 L*=0 J 

N 

N, since Jj(xi) is the coefficient of aj in the representation = ^ajfc/^(x,.). Simplifying gives 



15 ^a, 



A/ 



or 



N M 

-ll^kSkj^^^ifii^i)yi=Tj for y = 0, 1, K where 



A=0 



Xl>»',/ik(^/)/y(^i) = 5jty and X^'/f^^')^' = ' respectively. There are ^+1 equations 

1=1 i=l V * ' 



N 



X J = Tj , for y = 0, 1, iV, also known as ihe normal equations, including the non-negative weighting 



factors w,-, for the JV+l unknown parameters afc,for!A = 0, 1 readily yielding a solution, provided that the 

determinant of the normal equations is not zero: This may Be demonstrated by showing that the homogeneous 

20 equations 2^jt»S^,y =0 only have the.triviai Sj^ution ajt^O, for A = 0, 1, .... JV, which may be shown as 
follows. Multiply the ylh homogeneous equation by aj and sum over all y. 
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ta.A(x,)Tta;/;(x,)l = i:^'0<*'))' unply that3'(x,) = 0. and. hence. 

that a, = 0. for = 0, 1, .... M the trivial solution. TTierefore, the determinant of the nonnal equations b not zero, 
and the noma! equations, including the non-negative weighting fertoiSH*. may be solved for the iW-l panimelBrs 

5 abforifc=0. 1, .... JV, the co<fficients of Ihe non-polynomial least-squaittrepresen y{x) = Yi^jfj{x) 
thatmaybefittotheMdatapoints(x^O,usingthelinearlymdependentsetofAf^lfto 
the non-polynomial least-squares representation y(x) = f^ajfjix), and including the non-negative 
weighting factors >vj. 

In an adaptive control strategy, according to various illustrative embodiments of the present invention. 
10 an online system identification scheme runs along with the controller and constantly adjusts the model so that the 
model mimics the true behavior of the system. One difficult task in this situation is determining whether observed 
errors in the output are due to errors in accounting for tool differences or for product differences. IHe following 
discussion will outline a scheme for deciding which model parameters are m error and performing the correct 
model updates. 

15 We begin with a simple run-to-mn controller for a smgle process and expand to the case of multiple 

products and tools. Standard observability tests on Jinear process models will be used for the purposes of 
illustration. 

As an example, consider a simple etch or polish process where the objective is to reach a desired 
removaloneachrun.Asimplifiedmodelforthisprocessis:x = F-f. where X represents a thickness removal. 

20 r is a time-averaged rate, and t is tiie time of processing. 

In the adaptive control fonnulation, the estimate of the rate is adjusted from run to run using online 
system identification. For simplicity and ease of analysis, the model will be linearized here and converted to a 
state space representation. If the model is linearized about a nominal rate ro and time to, then the equation for the 
deviation y from the nominal removal yo is 

25 y = fQ -t + r-to, ■ W 

where t and r represent deviations from the nominal time and rate, respectively. Hien, the model is 
converted to state space representation, 

X,.„=AX,+BU, (2a) 

y.=Cx,, (2b) 
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where x is a vector of states, y is a vector of measured outputs, and u is the vector of inputs. The A and 
B matrices convey how the states and inputs affect future values of the states. The C matrix maps the current 
values of die states into the outputs that are actually measured. In the current example. 



5 




the state vector x contains Xadj, the change in removal caused by die time adjustment and r, the deviation 
from the nominal rate ro. The measurement vector y contains only y, the deviation from the nominal removal, and 
the input vector u contains only t, the deviation from the nominal time to. 

This model is adequate for control of a single process. The single rate estimate is assumed to apply for 

10 every run, and it is adjusted after each measurement. In a high-volume manufoctming environment, however, 
additional complexity is added because Aere are multiple tools and products. In this work, each product and tool 
combination is called the context Tlie control objective is to have each run at the target, regardless of which 
product and tool combination is running. A simple approadi is to assume that the one set of states applies to all 
processing contexts. The drawback of Ais method in an environment with several contexts is that the rates 

15 associated with each process can be drastically different from each other. When this happens, each switch to a 
new context appears as a step disturbance to the controller, as shown in Figure 24, as the controller has no 
understanding of why the rate would be changing too much. 

For example, in many applications, it is quickly observed that different products will have very different 
apparent reaction rates. However, the rate can drift from batch to batch, even if only one product is being made. 

20 This can be caused by reactor fouling, degradation of consumable materials, process leaks, and the like. Simply 
tracking an estimate for r from run to run is not acceptable because each switch to a different product appears as a 
step change, as shown below. As shown in Figure 24, a second product was run from batches 6 through 15, and 
reactor fouling caused the rate to continually decay over the course of the sunulation. 

Another illustrative method that is easy to implement is to group runs with similar contexts together so 

25 that they share parameter estimates. In such a method, there is no need to identify product and tool biases 
separately from each other. Each combination simply has its own rate estimate and updates this estimate based 
only on measurements from runs under that context This method has the drawback, however, that a disturbance 
to one tool, for instance, has to be recognized by every context where that tool is used. This can be 
disadvantageous in a large system because of the large number of runs that would miss their targets while the 

30 different contexts were updating their parameter estimates. That information should be immediately or rapidly 
shared between all contexts that the disturbance affects. 

There are cases where biases can be determined to be caused by different parts of the processing context. 
One example is that too-to-tool variation is repeatable regardless of the products being run, and 
product-to-product variation is consistent even when run on different tools. To take advantage of this observation, 

35 extra terms may be added to the model. For a CMP process, it makes sense to scale the rate for different products. 
This is largely because the removal rate is dependent on the features of the surfece in contact, and different 
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products wiU have different pattern densities. So, Ihe equation used here for the removal x^X-V-f't, 
where X represents a removal, f is a time-averaged rate constant for the tool, / is a product^ific rate 

scaling factor, and F is the time of processing. This relation is similar to Preston's equation describing a 
polishing process. 



Ax _ K^Fv 



(4) 



6t A 

where Ax is the removal,. At is the time of processing. K, is a rate constant, v is the surfece velocity. F is 
the force i^lied, and A is the surfece area of in contact. 

When linearized about a nominal ro. fb. and to, the equation for flie deviation y from the nominal removal 

becomes 

y^r^'fo-t + r-f.'t. + r.'f-to, (s) 

where t. r, and f represent deviations firan the nominal time, tool rate constant, and product scaling 
fector, respectively' The following state space representation inchides the estimates for tiie two model parameters 
as states. 







0 0 0" 






ro -fo 
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0 1 0 
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J . 




0 0 1 
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0 



r 



(6a) 



(6b) 



So, tiie question that arises is whether or not the two model parameters r and f can be uniquely 
identified. For a time-invariant linear system like tiie one here, the test for nonsingularity of the observability 
Gramian can be performed by computing the rank of 

0 = r A^'C^ Kf^^"" (7), 

where as many terms in tfie matrix are included as needed to try to achieve fiill rank, if possible. For the 
system above, 

1 0 0 

(8) 



o = 



/o-'o h'h /o-^o 



I'd -^0 



This matrix is not fiill .rank. so the system as it is currently defined is not observable. Thus it is not 
possible to uniquely identify Ihe model parameters using only run data from a single context. 
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10 



15 



20 



This result is intuitive, for in order to be able to identify the product-to-product and tool-to-tool 
dependencies, it is useftil to have a model that includes all the different processing contexts, hifonnation about 
different products should be shared between tools, aid vice versa. This may entail looking at the entire collection 
of processes as a whole, radier than concentrating on individual contexts one at a time. 

Consider a hypothetical process where there are two tools (1 and 2) and three products (A, B, and C). 
They can be run in any product/tool combination. Using the linearized form above and assuming that tiiere is a 
single ''nominar point for all combfaiations, the deviations from nominal removal for each context can be 
described by these equations. 

J'm =''0 -/o •'+''r/o -'o +'b -/^ -'o 



yiB =''0 /o +'\ -/o -'o +''0 '/b '^0 
yic ='o A •'+''1 '/o -'o +^0 -/c '^0 

=^0 -/o •'+''2 */o -'O '/a -'o 



(9a) 
(9b) 
(9c) 
(9d) 
(9e) 
(9f) 



This entire system can be combined into a single state space model. 
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(10a) 



(10b) 



where the states consist of the adjustment (xadj), tool biases (ri and t^, and product biases (fA, fe, and 1^). 
This model is of a hypothetical situation where all product/tool combinations run simultaneously with the same 
input settings. Although this situation would abnost never arise in practice, it is useful from the standpoint of 
understanding the interactions between the different processing contexts. It is clear, for instance, that the single fA 
product factor is used for all runs of product A, regardless of processing tool. 

The observability test as calculated here, ^' * 
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0 = 



is rank deficient by one, so the system is not observable in its current fomi. Here, only the first two 
terms in the matrix in Equation (7) suffice. Tlie reason for this is that aH runs are convoluted by both a product 
bias and a tool bias. An additional constraint is needed to fix one variable or the other. As an example, it may be 
possible to experimentally measure die tool parameter by qualifying the tools. Ihis would add extra system 

ou^uts = Tj and ~ ^2 • 

When such experiments are not an option, it is abo possible to simply select a reference tool or product 
that has the nominal bias. One disadvantage of this method is that it may be difficult to identify a reference tool 
or product in a manufiu^uring raivironment that is cwistanfly dianging. 

If the quaUfication experiments are added to the example system above, the new output equation for the 
combined system is 
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The observability matrix for this new system. 
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is full rank, so this system is observable. Here again, only the first two terms in the matrix in 
Equation (7) suffice. However, the system here is still the rare case where all possible runs can happen 
simultaneously. In practice, one run happens at a time. It is possible to determine the appropriate way to update 
the model states after each run. 

Structurally, this system resembles a real systert where the different measurements are sampled at 
diffijrent fiequencies. This multirate sampling problem has been given treatment in the recent literature. With 
such a system, the observability changes over time as different combinations of measurements are available. The 
underlying implicit requirement, though, is that the system be observable in the limiting case where all possible 
measurements are made at every time step. 
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At this point, a model for a hypothetical system with similar dynamics to the real system is available. 
The next step is to define a control law and an observer that can map the real process into the model space. 

The control objective is to drive each nm to target, regardless of processing context This is 
accomplished here by using a dead-beat control law using the current best guesses of the model states. A 
particular processing context is represented by a single row in the output matrix C. For example, the deviation 
from nominal removal for product B running on tool 2 is given by equation (9e), which corresponds to the 5* row 
of the output matrix. For ease of notation, the row of the output matrix corresponding to the current context wiU 
be denoted Ccon* Then, the desired input u^es satisfies 

y = (Ax + Bu ^ ) . (14) 

where ycon.de$ is the desired deviation from nominal removal for this processing context. Solving for the 
input Udes yields 

^ = {<^con^y{ycon,des " Cco«Ax). (15) 
This equation gives the input for any processing context, given the current estimates of the model 
parameters. 

The observer must map measurements of the real process mto the model so that state updates can occur. 
The design of the observer is not as simple as the control law. The reason is that in general, a single measurement 
from a process run is convoluted by both ^e tool bias and the product bias. The prediction error should be 
distributed between the parameter updates. The new information from eadi measurement must be merged in to 
the existing information. 

One simple way to accomplish this is to imagine the different elements of the output vector y as 
individual sensors which remain fixed at their last values until a new measurement changes them. The benefit of 
ihis approach is that a conventional state observer may be designed by taking the entire system into account. An 
observer gain matrix L can be chosen such tiiat 

= Ax, + Bu, + L(y, - Cy, ). (i6) 

The observer matrix maps the differences between the measured and predicted outputs mto changes to 
the state estimates. However, there is a major drawback with leaving past outputs fixed at their last values. When 
the input changes, those outputs are not really valid anymore, since they are no longer representative the current 
state of the process. If the measurements were current, they would change in response to the new input. 

At the opposite end of the spectrum from leaving the inactive outputs constant, ail outputs that are not 
measured at a time step could be set to the values that would be predicted from the current state estimates. This 
method results in all old measurements being ignored when tiie state update is performed. 

A series of Matlab'^^ simulations were run in order to illustrate the concepts of the plant-wide model. 
The first series of simulations were all run under the conditions specified in the example above. In this example, 
there were two processing tools and three products. Tool qualification events were available to directly measure 
the tool-specific model parameters. The starting point for the state estimators was that all tools and products were 
assumed to be matched at the nominal values. The control objective was to keep all the different tool and product 
combinations running at the nominal removal yo. 
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One important thing to note about these tests is that they begin fiom a starting point where the controller 
knows nothing about the system and has to identify aU products and took at once. WWle it is an ii^ 
controller to be able to do this, in normal steady-state operation the tool and product biases will be known feirly 
accurately, and disturbances will affect only a subset of the process. For this reason, most of these tests inject a 
distobance into the system after the controUer has stabilized the process in order to see how the controller reacts 
to an isolated disturbance. 

The first simulation establishes a baseline by iUustrating the best possible scenario. This is the 
hypothetical case mentioned previously where all possible runs happen simultaneously at each time step so that 
all of the measurements can be used to update the state estimates. White noise was added to the process output 
during the simulation, and a step disturbance was added to tool 2 at run 30 (also known as to^ 

Figure 25 contains the results of the tsst. Figure 25 schematically illustrates percent deviation from target in this 
hypodietical best case scenario. In this hypothetical case, the controller is able to quickly reach the target and 
reject the disturbance. 

The most important result from this test is that the system as a whole, in the case where the controller 
has the maximum amount of information available, performs well. All other tests will deal with more realistic 
situations where the controller only has a subset of that information. The subsequent runs wiH attempt to iUustrate 
some of the factors that are important in detemiinmg if the controUer can flmction with the reduced set of 
information. 

The next situation tested is where the nms occur only one at a timk and the outputs are held at their last 
values untU they are updated. For testing purposes, on each run. either a random product was run on a random 
tool, or a qualification event was logged. All eight possible scenarios (sbc production contexts and two 
quaUfication events) had equal probability of occumng at each run. As mentioned previously, this situation 
caused problems because the outputs must change when the input changes to provide useful infoimattoa In this 
test, the input is changing on eveiy run. but only one output is being updated each time. Thus, flie outputs used 
for feedback are not representative of the true state of ti>e process. Figure 26 shows the results of this experiment. 
Figure 26 schematically illustrates percent deviation from target m fliis "fixed ou^uts" case. 

This configuration was not able to control tiie process very well. The stale information contained in the 
output vector adversely affects the state observer because tiie measurements do not appear respond to changes m 
die mputs. This causes the controller to try to compensate for them repeatedly, leading to instability. 

The next simulation case tested used the current model estimates of tire states to estimate values for die 
measurements that were missing. As a result, each state update only contained information from the run that was 
being measured. The testing conditions were similar to tiie previous simulation, wifli fee random selection of 
runs. However, a step disturbance to the rate on tool 2 was mjected at nm or time step 80. The results of die test 
can be seen in Figure 27. Figure 27 schematically illustrates percent deviation from target in titis "predicted 
outputs" case. TTiis scheme reaches the target and successfiilly rejects die disturbance. However. ti»e response is 
very sluggish compared to the case where all die information about die process is measured. 

The test depicted in Figure 28 is flte same controller as the previous one, but a different set of mles were 
applied to generate the sequence of runs. Figure 28 schematically illustrates percent deviation from target in this 
"predicted outputs" case with extra qualifications. The tools were qualified twice as often as in tiie previous case 
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in order to determine the effect on the controller. In this test, the system equilibrated more quickly and rejected 

the disturbance more easily than in the prior case. 

To test the scalability of the system, the last test used 6 tools and 7 products. Figure 29 schematically 

illustrates percent deviation from tai:get m this lai^e-scale system case. The system began under control and had 
5 to deal with multiple disturbances. A step disturbance was added to tool 1 at run or time step 50, and product 6 

had a step disturbance at run or time step 150. The results in Figure 29 demonstrate ttiat tiie controller 

successfully rejects the disturbances over a very short number of runs. 

This is important because it demonstrates the power of a model that takes the entire system into account. 

For instance, the bias on tool 1 could be detected while product A was running on the tool. In that case, both the 
10 estimates for ri and fA would be adjusted slightly. The next nm of any product on tool 1 would still exhibit the 

bias (although to a lesser extent), and the estimate for ri would be adjusted even closer to the new correct value. 

The next run of product A on any tool would move the estunate for fA back in the direction of ttie correct value. 

In this configuration, each state estimate iq)date is only using information from the measurements of tiie 

current run: Since all older information is being ignored, it is important for the observer to be feirly damped. A 
15 qualitative example is instructive. If a run of product A on tool 1 had a removal &at was higher than predicted, 

then one or both of the estimates for rj and fA are too low. In order to correctly update the states, it is necessary to 

look at the results of other runs of product A on different tools or runs of other products on tool 1. However, the 

chosen update strategy only allows the update to be based on the results of Ihe current run. By having the 

observer only make small changes from each run, the states will at least move in the correct directions, and all the 
20 runs together will move the state estimates to the correct values. Clearly, the ideal solution to tiiis problem would 

attempt to combine as much mformation as possible from old measurements with the new values when doing the 

state estimate. 

. The multiple processmg contexts conmionly seen in large scale semiconductor manufecturing present an 
interesting control challenge. A control and estimation approach was developed to examine the entire processing 
25 environment that a controller sees as a whole instead of focusing only on individual contexts one at a time. 
Simulations run under these conditions show that the idea does have merit. Given enough information, a plant 
wide controller is able to handle the entire system composed of all the different processes. 

Several factors influenced the performance of the controller. It relies on having the most information it 
can gain about the process. The decision about what to' do on each run to tiie system outputs that are not 
30 measured has a drastic effect on the performance. Also, the control response is improved when data can be 
obtained from qualification events mixed in with the production runs. These provide direct measurements of 
important model parameters. ' 

One interesting feature of this system is that the^model must be rebuUt whenever tools or products are 
added or subtracted. This is because the model accounts for the entire system at once. Although this may appear 
35 to be a drawback, it actually provides insight into the process which is not intuitive. Since the observer must be 
rebuilt based on the entire system each time the system changes, the properties of the feedback are dependent on 
the tool and product distribution. This means that an^ error detected on one processing context should be treated 
differently depending on the other contexts in the^entire system. 

When the system is viewed as a whole; it is apparent that there is a great deal of information shared 
40 between different parts of the system. Since the performance of the controller is tied to the quality of the 
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information it is able to extract, it would be beneficial to closely examine the effects of processing brder.and 
sampling plans. A sufficiently advanced controller would be able to prioritize certain runs tod measurements on 
the basis of the infomation they would provide about the state of the system. Closely related to this is the 
concept of event-driven model-based control. Instead of viewing the process as a contmuum, the states of the 
5 process, including the model parameter estimates, are affected by a series of discrete modeled events. 

In various illustrative embodiments, a model may be developed for the tool state (x) which is 
independent of the product. This tool state is an intrinsic rate for the tool A change in this rate affects all 
products that run on the tool. 

1 0 The process state (x) is mapped to the product state (y) using Ae output equation: 

Then, the estimator is used to track the tool state (x), rather than product state (y). Inspection of the 
Kahnan optimal filtering equations indicates that optimal observer gain is a function of the output mapping (C). 

/> = ^J/>-/>C/(c,FC/ +RYc,p\iJ +GQG^ 

So, by using offline analysis, tiie repeatable product dependence can be quantified to arrive at a new 
model for the rate r, where ro is the "intrinsic rate" of the processmg tool, and is flie product-specific correction 
factor. 

20 The observer then estimates r© instead of r, by scaling tiie observed rate by each product-specific fector. 

In a situation where the product specific fectofs are known exactly, flie sch«ne described above woiks 
very well. Changes in the operation of the processing tool are observed regardless of which product is running. 
However, in a real manufacturing environment, several complications arise. For example, there can be several 
processing tools, new products appear, and experiments can be very expensive in terms of both raw materials and 

25 processing tool downtime. The impact here is that the product specific factors are not always known a priori. 

The method above observes a single parameter (ro), but it is necessary to find a way to quickly obtain 
estimates for new kp. This can be done by observing the rate at each run, and updating the model parameters 
accordingly. The result of each run is a measurement of die apparent rate r. To estimate r© and kp firom the data 
(r), the model equation is used. 

^ 30 ''='0-*/, 

Using a Taylor series approximation. A/' = • tsk^ +k^'ArQ. 

What this means is that an apparent change in the value of r can be expressed as a change in the 
estimates of ro and kp. So, it is necessary to classify the changes (using an analysis of variance technique) in order 
to determine how to distribute the error between the two parameters. 
35 One method of using this estimator is to apply a linear filter to each parameter. 
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The lambda values are varied in order to reflect die confidence in the parameter estimates. In situations 
where ro is expected to be changing, \ is high, and in situations where kp is. thought to be in error, Xk is high. 
5 As an example, for a well-established product, there is a high degree of confidence diat kp is accurate. In 

addition, ro is known to drift over tune. Thus, the relation » is used. On the ottier band, for a new 
product, &ere is little confidence in the value of kp. It is expected that an inaccurate kp will affea the rate more 
tiian the noise or drift in ro, so » isset. 

Matlab™ simulations show that this scheme tracks the process very well. The simulations were run in 
10 the following way. The number of processing tools (n), the number of products (m), and the number of runs (p) 
were chosen beforehand. Each product was given a unique ^al" value for kp, and each tool was given a unique 
value for ro. For each run, a random tool and product were chosen. A measurement was calculated by multiplying 
the correct ro and kp togefter and adding random noise. Th^, tiie ro for tiie chosen tool had an ofiset added to it, 
b order to simulate a drift over time. The parameter estimates were updated after each run as described above. In 
15 all cases, the estimates tracked to the real values of the parameters very quiddy. 

It is interesting to note that choosing the products and tools randomly made the convergence faster than 
using long strings of runs of given products on certain tools (tool dedication). When tools are dedicated it is 
difficult to assign the error in die rate estimate appropriately between the two parameters. This seems to be 
related, to the persistent excitation requirement in system identification theory, but it is an interesting result 
20 because traditionally tool dedication is thought to make process control easier. The best controller for this process 
will be able to address the dual control problem of simultaneous identification and control Making the process 
choices involves a tradeoff between tightly tracking the targets and helping to characterize tiie process because 
the two objectives conflict with each other. 

In various illustrative embodiments related to polishing and/or etching, for example, referring to 
25 Figure 30, a simplified block diagram of an illustrative manufacturing system 3010 is provided. In the illustrated 
embodiment, the manufacturing system 3010 is adapted to fabricate semiconductor devices. Although the 
invention is described as it may be implemented in a semiconductor fabrication facility, the invention is not so 
limited and may be applied to other manufacturing environments. A network 3020 interconnects various 
components of the manufacturing system 3010, allowing them to exchange information. The illustrative 
30 manufacturing system 3010 includes a plurality of tools 3030-3080. Each of the tools 3030-3080 may be coupled 
to a computer (not shown) for interfecing with tiie network 3020. 

A process control server 3090 directs the high level operation of the manufacturing system 3010 by 
directing the process flow. The process control server 3090 monitors the status of the various entities in the 
manufacturing system 3010, including the tools 3030-3080. A database server 30100 is provided for storing data 
35 related to the status of the various entities and articles of manufacture (e.g., wafers) in the process flow. The 
database server 30100 may store information in one or more data stoics 301 10, The data may include pre-process 
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and post-piocess metrology data, tool states, process flow activities (eg., scheduled maintenance events, 
processing routes for lots of wafers), and the like. THe distribution of the processing and data storage functions 
amongst the different computers is generally conducted to provide independence and a central information store. 
Of course, more or fewer computers may be used. 

An exemplary information exchange and process control framework suitable for use in the 
manufacturing system 3010 is an Advanced Process Control (APQ framework, such as may be implemented 
using the Catalyst system offered by KLA-Tencor, Inc. The Catalyst system uses Semiconductor Equipment and 
Materials International (SEM) Computer MtegrBted Manufecturing (CIM) Framework compUant system 
technologies and is based the Advanced Process Control (AFC) Framework. CIM (SEMI E81-0699 - Provisional 
Specification for CIM Framework Domain Architecture) and APC (SEMI E93-0999 - Provisional Specification 
for CIM Framework Advanced Process Control Component) specifications are publicly available from SEMl^ 

Portions of the invention and correspondmg detailed description are presented in temis of software, or 
algorithms and symbolic representations of operations on data bits within a computer memory. These 
descriptions and representations are the ones by which those of ordinary skiU in the art effectively convey fte 
substance of their work to others of oniinary skiU m the art An algorilhm, as the tenn is used h^^ 
used generaUy. is conceived to be a self^nsistent sequence of steps leading to a desired result TTie steps are 
those requiring physical manipulations of physical quantities. UsuaBy. though not necessarily, these quantrties 
take the form of optical, electrical, or magnetic signab capable of being stored, transferred, combined, compared, 
and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer 

20 to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like. 

It should be borne in mind, however, that all of these and similar terms are to be associated with the 
appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically 
stated otherwise, or as is apparent from the discussion, terms such as "processing" or "computmr or 
•'calculating or "determiriing- or -displaying-' or the like, refer to the action and 

25 or similar electronic computing device, that manipulates and transforms data represented as physical, electronic 
quantities within the computer system's registers and memories into other data similarly represented as physical 
quantities within the computer system memories or registers or other such information storage, transmission or 
display devices. 

•nie tools 3030-3080 are grouped into sets of like tools, as denoted by lettered suffixes. A particular 
30 wafer or lot of wafers progresses through the tools 3030-3080 as it is being manufeclured. with each tool 
3030-3080 performing a specific fimction in the process flow. Exemplary processing tools 3030-3080. inchide 
photolithography steppers, etch tools, deposition tools, polishing tools, rapid thermal processing tools, ion 
implantation tools, and the like. Some of the tools 3030-3080 may also be metrology tools adapted to measure 
characteristics (e.g.. surface profiles) of the wafers being processed, ta the illustrated embodiment, the set of tools 
35 3030A-3030C represent etch tools, and the set of tools 3070A-3070C represent polishing tools. Typically, the 
path a particular wafer or lot passes through the process flow varies. The process control server 3090 routes the 
individual lots through the process flow depending on the steps that need to be perfomied and the availabilities of 
the tools 3030-3080. A particular lot of wafers may pass through the same tool 3030-3080 more tiian once in its 
production (eg., a particular etch tool 3030 may be used for more tiian one eteh operation). 
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The tools 3030-3080 are illustrated in a rank and file grouping for illustrative purposes only. In an actual 
implementation, the tools may be airanged in any order of grouping. Additionally, the connections between the 
tools in a particular group are meant to represent, only connections to tiie netwoilc 3020, radier fhan 
interconnections between the tools. 

5 The process control server 3090 controls the path of a particular lot of wafers through the tools 

3030-3080. Based on process data, the process control server 3090 monitors the operating states of the tools 
3030-3080. The process data may include pre-and post process measurements of wafers progressing through tiie 
tools 3030-3080. For example, if a particular polishing tool, e.g., 70A, is operating in a state that favors center- 
fast polishing, the process control server 3090 notes that tendency. The process control server 3090 may also 

10 monitor the operating states of other tools, such as the etch tools 3030 to deteraiine if the current state of the etch 
tool favors center-fast or center-slow etching. 

The process control server 3090 may mitiate pre-processing and/or post-processing metrology evrats as 
necessary to determine the operating states of the tools 3030-3080. Hie data from the metrology events may be 
returned to the process control server 3090 (or some odier computing resource on the network 3020} and 

15 analyzed. Alternatively, the process control server 3090 may access process data ahcady collected and stored in 
the data store 30110. For example, pre-process and post-process metrology data may have been collected for 
various tools to generate statistical data for process control and/or fault detection. 

The process control server 3090 evaluates the current operating states of the tools 3030-3080 as it 
determines the particular routing of a lot of wafers through the process flow of the manufacturing system 3010. 

20 For example, prior to performing a polishing procedure on a particular lot, the process controller 3090 first 
determines the surface profile (eg., dished or domed) of the wafers m the lot The process control server 3090 
may initiate a metrology event to determine the surface profile or access tiie data store 301 10 for the information. 
After determining the mcommg surface profile, ttie process control server 3090 evaluates the current operating 
states of the polishing tools 3070A-3070C to detemiine which tool(s) have a tendency to polish m a manner 

25 complimentary to the incoming surface profile. If the incoming surface profile is dished, the process control 
server 3090 selects a polishing tool 3070A-3070C operating in a center-slow state. Similarly, if the incoming 
surface profile is domed, the process control server 3090 selects a polishing tool 3070A-3070C operating in a 
center-fast state. , . j 

A similar approach may be applied to'^aii etch process. The process control server 3090 selects the 

30 particular etch tool 3030A-3030C having an operating state complimentary to the incoming surface profile. If the 
incoming surface profile is dished, the process cbntrpl se^er 3090 selects an etch tool 3030A-3030C operatmg in 
a center-slow state. Sunilarly, if the incoming surface profile is domed, the process control server 3090 selects an 
etch tool 3030A-3030C operating m a center-fest st^te. ^ 

Any of the above-disclosed embodiments of a method according to the present invention enables the use 

35 of parametric measurements sent from measuring tools to make supervisory processing adjustments, either 
manually and/or automatically, to improve and/or better control the yield. Furthermore, many of the 
above-disclosed embodiments of a method of onsmufacturing according to the present invention provide a 
significant improvement in sampling methodology by treating sampling as an integrated part of the dynamic 
control environment of Advanced Process Control (APC) systems. Rather than applying a static "optimum" 

40 sampling rate, sampling is treated as a dynamic variable that is mcreased or decreased based upon (1) situational 
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infonnation. such as the amount and/or rate of change in the variation in recent data, (?) events, such as 
maintenance and/or changes in the process upstream of the operation, and/or (3) requirements of closed-loop 
nm-to-run controllers in their schemes to identify cont«,l model parameters. Additionally, any of the 
above^losed embodiments of a method of manufecturing according to the present invention enables 
semiconductor device febrication with increased device accuracy and precision, increased efficiency and 
increased device yield, enabling a streamlined and simpUfied process flow, thereby decreasing the complexity 
and lowering die costs of the manufecturing process and inweasing throughput 

The particular embodiments disclosed above are iOustrative only, as the invention may be modified and 
practiced in different but equivalent mamiers apparent to those skilled in the art having the benefit of the 
teachings herein. Furlhemiore. no limitations are intended to the details of construction or design herein shown, 
other than as described in the ctoims below. It is therefore evident that the particular embodiments disclosed 
above may be altered or modified and all such variations are considered within the scope and spirit of the 
invention. In particular, every range of values (of the fonn. »fiom about a to about b," or, equivalently. "from 
approximately a to b," or, equivalently, "from approximately a-b") disclosed herein is to be understood as 
referring to the power set (the set of all subsets) of the respective range of values, in the sense of GeoiB Cantor. 
Accordingly, the protection sou^t herein is as set forth in flie claims bdow. 
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CLAIMS 

1 . A method comprising: 

sampling (1 10) at least one parameter characteristic of processing performed on a woikpiece in 
at least one processing step (105); 

modeling the at least one characteristic parameter sampled using an adaptive sampling 
processing model (130), treating sampling as an integrated part of a dynamic control 
environment, varying the sampling based upon at least one of situational information, 
upstream events and requirements of run-to-run controllers; and 

applying the adaptive sampling processing model (130) to modify (135,155,160) the processing 
performed in tiie at least one processing step (105). 

2. The metiiod of claim 1 , wherein sampling (1 10) the at least one parameter characteristic of the 
processing perfoimed on the woikpiece in the at least one processing step (105) comprises monitoring (1 10) ±e 
at least one characteristic parameter using an advanced process control (APC) system (120). 

3. The method of claim 2, wherein monitoring (110) the at least one characteristic parameter 
using the advanced process control (APC) system (120) comprises using the advanced process control (APC) 
system (120) to monitor at least one tool variable (110) of a processing tool during the at least one processing, 
step (105). 

4. The method of claim I, wherein modeling the at least one characteristic parameter sampled 
using the adaptive sampling processing model (130) comprises using an adaptive sampling processing 
model (130) incorporating at least one of a model predictive control (MPC) controller and a 
proportional-integral-derivative (PID) controller having at least one tuning parameter. 

5. The method of claim 4, wherein using the adaptive sampling processing model (130) 
incorporating the at least one of a model predictive control (N4PC) controller and a 
proportional-integral-derivative (PID) controller having the at least one tuning parameter comprises using the 
adaptive sampling processing model (130) incorporating at least one of a closed-loop model predictive ccmtrol 
(MPC) controller and a closed-loop proportional-integral-derivative (PID) controller having the at least one 
tuning parameter. 

6. The method of claim 4, wherein applying the adaptive sampling processing model (130) to 
modify (135,155,160) the processing performed in the at least one processing step (J 05) comprises 
tuning (145,150) the at least one tuning parameter to improve (155) the processing performed in the at least one 
processing step (105). 

7. A method comprising: 

sampling (1 10) at least one parameter characteristic of processing performed on a workpiece in 
at least one processing step (105); 
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modeling Ihe at least one characteristic parameter sampled using an adaptive sampling 
processing model (130), treating sampling as an integrated part of a dynamic control 
environment, varying the sampling based upon at least one of situational information 
comprismg at least one of an amomit of variation in recent data and a rate of change in 
the variation in the recent data, upstream events comprising at least one of 
maintenance in a piccess upstream and changes in the process upstream, and 
requirements of run-to-run conttoUers attempting to identify control model 
parametKs; and 

applying the adaptive sampling processing model (130) to modify (135,155.160) the processing 
performed m flie at least one processing step (105). 

8. nie method of claim 7, wherein sampling the at least one parameter characteristic of the 
processing perfomied on the worlcpiece in the at least one processing step (105) comprises monitoring (110) the 
at least one characteristic parameter using an advanced process control (APQ sys.em(120). and wheiem 
monitoring(110)the at least one characteristic parameter using the advanced process control 

comprises using the advanced process control (APQ system(120) to monitor at least one tool ^ 

rapid Aermal processing tool during the at least one pfocessing step (105). 

9. Tl>e method of claim 7, wherein modeling the at least one characteristic parameter sampled 
using the adaptive sampling processing model (130) comprises using an adaptive sampling processing 
model (130) incorporating at least one of a model predictive control (MPC) controUer and a. 
proportional-integral-derivative (PID) controUer having at least one tuning parameter. Mfherein using the adaptive 
sampling processing model (130) incorporating the at least one of a model predictive control (MPC) conHoller 
and a proportional-integral-derivative (PID) controller having the at least one tuning parameter comprises using 
the adaptive sampling processing model (130) incoqwrating at least one of a closed-loop model predictive 
control (MPC) controller and a closed-loop proportional-integral-derivative (PID) controller having the at least 
one tuning parameter, and wherein applying the adaptive sampling processing model (130) to 
modify (135,155,160) the processing performed in the at least one processing step (105) comprises 
nming (145,150) the at least one mning parameter to improve (155) the processing performed in the at least one 
processing step (105). 

10. A system comprising: 

a tool for sampling (110) at least one parameter characteristic of processmg performed on a 

workpiece in at least one processing step (105); 
a computer for modeling the at least one characteristic parameter sampled using an adaptive 
sampling processing model (130), treating sampling as an integrated part of a dynamic 
control environment, varying the sampling based upon at least one of situational 
information, upstream events and requirements of run-to-run controllers; and 
a controller for appfymg the adaptive sampling processing model (130) to 
modify (135,155,160) the processing perfonned in the at least one processing step (105), wherein the 
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tool for sampling the at least one parameter characteristic of the processing performed on the workpiece 
in the at least one processing step (105) comprises a monitor for monitoring (110) the at least one 
characteristic parameter using an advanced^ process control (APC) system (120), wherein the advanced 
process control (APC) system (120) monitors at least one tool variable (110) of at least one processmg 
tool during the at least one processing step (105), wherein the computer modeling the at least one 
characteristic parameter sampled uses an adaptive sampling processing model (130) incorporating at 
least one of a model predictive control (MPC) controller and a proportional-integral-derivative (FID) 
controller having at least one tuning parameter, wherein the computer uses the adaptive sampling 
processing model (130) incorporating at least one of a closed-loop model predictive control (MPC) 
controller and a closed-loop proportional-integral-derivative (PID) controller having the at least one 
tuning parameter, and wherein the controller implying the adaptive -sampling processing model (130) to 
modify (135,155,160) the processing p^ormed in the at least one processmg step (105) tunes the at 
least one tuning parameter to improve (155) the processing performed in the at least one processing 
step (105). 
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